nvidia_triton.cache.insertion.duration (gauge) | Total cache insertion duration, in microseconds Shown as microsecond |
nvidia_triton.cache.lookup.duration (gauge) | Total cache lookup duration (hit and miss), in microseconds Shown as microsecond |
nvidia_triton.cache.num.entries (gauge) | Number of responses stored in response cache |
nvidia_triton.cache.num.evictions (gauge) | Number of cache evictions in response cache |
nvidia_triton.cache.num.hits (gauge) | Number of cache hits in response cache |
nvidia_triton.cache.num.lookups (gauge) | Number of cache lookups in response cache |
nvidia_triton.cache.num.misses (gauge) | Number of cache misses in response cache |
nvidia_triton.cache.util (gauge) | Cache utilization [0.0 - 1.0] |
nvidia_triton.cpu.memory.total_bytes (gauge) | CPU total memory (RAM), in bytes Shown as byte |
nvidia_triton.cpu.memory.used_bytes (gauge) | CPU used memory (RAM), in bytes Shown as byte |
nvidia_triton.cpu.utilization (gauge) | CPU utilization rate [0.0 - 1.0] |
nvidia_triton.energy.consumption.count (count) | GPU energy consumption in joules since the Triton Server started |
nvidia_triton.gpu.memory.total_bytes (gauge) | GPU total memory, in bytes Shown as byte |
nvidia_triton.gpu.memory.used_bytes (gauge) | GPU used memory, in bytes Shown as byte |
nvidia_triton.gpu.power.limit (gauge) | GPU power management limit in watts Shown as watt |
nvidia_triton.gpu.power.usage (gauge) | GPU power usage in watts Shown as watt |
nvidia_triton.gpu.utilization (gauge) | GPU utilization rate [0.0 - 1.0) |
nvidia_triton.inference.compute.infer.duration_us.count (count) | Cumulative compute inference duration in microseconds (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.infer.summary_us.count (count) | Cumulative compute inference duration in microseconds (count) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.infer.summary_us.quantile (gauge) | Cumulative compute inference duration in microseconds (quantile)(does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.infer.summary_us.sum (count) | Cumulative compute inference duration in microseconds (sum) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.input.duration_us.count (count) | Cumulative compute input duration in microseconds (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.input.summary_us.count (count) | Cumulative compute input duration in microseconds (sum) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.input.summary_us.quantile (gauge) | Cumulative compute input duration in microseconds (quantile) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.input.summary_us.sum (count) | Cumulative compute input duration in microseconds (count) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.output.duration_us.count (count) | Cumulative inference compute output duration in microseconds (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.output.summary_us.count (count) | Cumulative inference compute output duration in microseconds (count) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.output.summary_us.quantile (gauge) | Cumulative inference compute output duration in microseconds (quantile) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.output.summary_us.sum (count) | Cumulative inference compute output duration in microseconds (sum) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.count.count (count) | Number of inferences performed (does not include cached requests) |
nvidia_triton.inference.exec.count.count (count) | Number of model executions performed (does not include cached requests) |
nvidia_triton.inference.pending.request.count (gauge) | Instantaneous number of pending requests awaiting execution per-model. |
nvidia_triton.inference.queue.duration_us.count (count) | Cumulative inference queuing duration in microseconds (includes cached requests) Shown as microsecond |
nvidia_triton.inference.queue.summary_us.count (count) | Summary of inference queuing duration in microseconds (count) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.queue.summary_us.quantile (gauge) | Summary of inference queuing duration in microseconds (quantile) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.queue.summary_us.sum (count) | Summary of inference queuing duration in microseconds (sum) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request.duration_us.count (count) | Cumulative inference request duration in microseconds (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request.summary_us.count (count) | Summary of inference request duration in microseconds (count) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request.summary_us.quantile (gauge) | Summary of inference request duration in microseconds (quantile) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request.summary_us.sum (count) | Summary of inference request duration in microseconds (sum) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request_failure.count (count) | Number of failed inference requests, all batch sizes |
nvidia_triton.inference.request_success.count (count) | Number of successful inference requests, all batch sizes |