- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
This check monitors Nvidia Triton through the Datadog Agent.
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.
The Nvidia Triton check is included in the Datadog Agent package. No additional installation is needed on your server.
By default, the Nvidia Triton server exposes all metrics through the Prometheus endpoint. To enable all metrics reportings:
tritonserver --allow-metrics=true
To change the metric endpoint, use the --metrics-address
option.
Example:
tritonserver --metrics-address=http://0.0.0.0:8002
In this case, the OpenMetrics endpoint is exposed at this URL: http://<NVIDIA_TRITON_ADDRESS>:8002/metrics
.
The latency summary metrics are disabled by default. To enable summary metrics for latencies, use the command below:
tritonserver --metrics-config summary_latencies=true
The response cache metrics are not reported by default. You need to enable a cache implementation on the server side by specifying a <cache_implementation> and corresponding configuration.
For instance:
tritonserver --cache-config local,size=1048576
Nvidia Triton also offers the possibility to expose custom metrics through their Openemtrics endpoint. Datadog can also collect these custom metrics using the extra_metrics
option.
Edit the nvidia_triton.d/conf.yaml
file, in the conf.d/
folder at the root of your Agent’s configuration directory to start collecting your nvidia_triton performance data. See the sample nvidia_triton.d/conf.yaml for all available configuration options.
Run the Agent’s status subcommand and look for nvidia_triton
under the Checks section.
nvidia_triton.cache.insertion.duration (gauge) | Total cache insertion duration, in microseconds Shown as microsecond |
nvidia_triton.cache.lookup.duration (gauge) | Total cache lookup duration (hit and miss), in microseconds Shown as microsecond |
nvidia_triton.cache.num.entries (gauge) | Number of responses stored in response cache |
nvidia_triton.cache.num.evictions (gauge) | Number of cache evictions in response cache |
nvidia_triton.cache.num.hits (gauge) | Number of cache hits in response cache |
nvidia_triton.cache.num.lookups (gauge) | Number of cache lookups in response cache |
nvidia_triton.cache.num.misses (gauge) | Number of cache misses in response cache |
nvidia_triton.cache.util (gauge) | Cache utilization [0.0 - 1.0] |
nvidia_triton.cpu.memory.total_bytes (gauge) | CPU total memory (RAM), in bytes Shown as byte |
nvidia_triton.cpu.memory.used_bytes (gauge) | CPU used memory (RAM), in bytes Shown as byte |
nvidia_triton.cpu.utilization (gauge) | CPU utilization rate [0.0 - 1.0] |
nvidia_triton.energy.consumption.count (count) | GPU energy consumption in joules since the Triton Server started |
nvidia_triton.gpu.memory.total_bytes (gauge) | GPU total memory, in bytes Shown as byte |
nvidia_triton.gpu.memory.used_bytes (gauge) | GPU used memory, in bytes Shown as byte |
nvidia_triton.gpu.power.limit (gauge) | GPU power management limit in watts Shown as watt |
nvidia_triton.gpu.power.usage (gauge) | GPU power usage in watts Shown as watt |
nvidia_triton.gpu.utilization (gauge) | GPU utilization rate [0.0 - 1.0) |
nvidia_triton.inference.compute.infer.duration_us.count (count) | Cumulative compute inference duration in microseconds (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.infer.summary_us.count (count) | Cumulative compute inference duration in microseconds (count) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.infer.summary_us.quantile (gauge) | Cumulative compute inference duration in microseconds (quantile)(does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.infer.summary_us.sum (count) | Cumulative compute inference duration in microseconds (sum) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.input.duration_us.count (count) | Cumulative compute input duration in microseconds (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.input.summary_us.count (count) | Cumulative compute input duration in microseconds (sum) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.input.summary_us.quantile (gauge) | Cumulative compute input duration in microseconds (quantile) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.input.summary_us.sum (count) | Cumulative compute input duration in microseconds (count) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.output.duration_us.count (count) | Cumulative inference compute output duration in microseconds (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.output.summary_us.count (count) | Cumulative inference compute output duration in microseconds (count) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.output.summary_us.quantile (gauge) | Cumulative inference compute output duration in microseconds (quantile) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.compute.output.summary_us.sum (count) | Cumulative inference compute output duration in microseconds (sum) (does not include cached requests) Shown as microsecond |
nvidia_triton.inference.count.count (count) | Number of inferences performed (does not include cached requests) |
nvidia_triton.inference.exec.count.count (count) | Number of model executions performed (does not include cached requests) |
nvidia_triton.inference.pending.request.count (gauge) | Instantaneous number of pending requests awaiting execution per-model. |
nvidia_triton.inference.queue.duration_us.count (count) | Cumulative inference queuing duration in microseconds (includes cached requests) Shown as microsecond |
nvidia_triton.inference.queue.summary_us.count (count) | Summary of inference queuing duration in microseconds (count) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.queue.summary_us.quantile (gauge) | Summary of inference queuing duration in microseconds (quantile) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.queue.summary_us.sum (count) | Summary of inference queuing duration in microseconds (sum) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request.duration_us.count (count) | Cumulative inference request duration in microseconds (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request.summary_us.count (count) | Summary of inference request duration in microseconds (count) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request.summary_us.quantile (gauge) | Summary of inference request duration in microseconds (quantile) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request.summary_us.sum (count) | Summary of inference request duration in microseconds (sum) (includes cached requests) Shown as microsecond |
nvidia_triton.inference.request_failure.count (count) | Number of failed inference requests, all batch sizes |
nvidia_triton.inference.request_success.count (count) | Number of successful inference requests, all batch sizes |
The Nvidia Triton integration does not include any events.
nvidia_triton.openmetrics.health
Returns CRITICAL
if the Agent is unable to connect to the Nvidia Triton OpenMetrics endpoint, otherwise returns OK
.
Statuses: ok, critical
nvidia_triton.health.status
Returns CRITICAL
if the Server is having a 4xx or 5xx response, OK
if the response is 200, and unknown
for everything else.
Statuses: ok, warning, critical
The Nvidia Triton integration can collect logs from the Nvidia Triton server and forward them to Datadog.
Collecting logs is disabled by default in the Datadog Agent. Enable it in your datadog.yaml
file:
logs_enabled: true
Uncomment and edit the logs configuration block in your nvidia_triton.d/conf.yaml
file. Here’s an example:
logs:
- type: docker
source: nvidia_triton
service: nvidia_triton
Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.
Then, set Log Integrations as pod annotations. This can also be configured with a file, a configmap, or a key-value store. For more information, see the configuration section of Kubernetes Log Collection.
Annotations v1/v2
apiVersion: v1
kind: Pod
metadata:
name: nvidia_triton
annotations:
ad.datadoghq.com/apache.logs: '[{"source":"nvidia_triton","service":"nvidia_triton"}]'
spec:
containers:
- name: ray
Need help? Contact Datadog support.