Overview

OpenTelemetry host metrics dashboard

To collect system metrics such as CPU, disk, and memory usage, enable the host metrics receiver in your Collector.

For more information, including supported operating systems, see the OpenTelemetry project documentation for the host metrics receiver.

Setup

Add the following lines to your Collector configuration:

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      paging:
        metrics:
          system.paging.utilization:
            enabled: true
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
      disk:
      filesystem:
        metrics:
          system.filesystem.utilization:
            enabled: true
      load:
      memory:
      network:
      processes:

Set up the host metrics receiver on each node from which metrics need to be collected. To collect host metrics from every node in your cluster, deploy the host metrics receiver as a DaemonSet collector. Add the following in the Collector configuration:

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      paging:
        metrics:
          system.paging.utilization:
            enabled: true
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
          system.cpu.physical.count:
            enabled: true
          system.cpu.logical.count:
            enabled: true
          system.cpu.frequency:
            enabled: true
      disk:
      filesystem:
        metrics:
          system.filesystem.utilization:
            enabled: true
      load:
      memory:
      network:
      processes:

Data collected

Host Metrics are collected by the host metrics receiver. For information about setting up the receiver, see OpenTelemetry Collector Datadog Exporter.

The metrics, mapped to Datadog metrics, are used in the following views:

Note: To correlate trace and host metrics, configure Universal Service Monitoring attributes for each service, and set the host.name resource attribute to the corresponding underlying host for both service and collector instances.

The following table shows which Datadog host metric names are associated with corresponding OpenTelemetry host metric names, and, if applicable, what math is applied to the OTel host metric to transform it to Datadog units during the mapping.

Datadog metric nameOTel metric nameMetric descriptionTransform done on OTel metric
system.load.1system.cpu.load_average.1mThe average system load over one minute. (Linux only)
system.load.5system.cpu.load_average.5mThe average system load over five minutes. (Linux only)
system.load.15system.cpu.load_average.15mThe average system load over 15 minutes. (Linux only)
system.cpu.idlesystem.cpu.utilization
Attribute Filter state: idle
Fraction of time the CPU spent in an idle state. Shown as percent.Multiplied by 100
system.cpu.usersystem.cpu.utilization
Attribute Filter state: user
Fraction of time the CPU spent running user space processes. Shown as percent.Multiplied by 100
system.cpu.systemsystem.cpu.utilization
Attribute Filter state: system
Fraction of time the CPU spent running the kernel.Multiplied by 100
system.cpu.iowaitsystem.cpu.utilization
Attribute Filter state: wait
The percent of time the CPU spent waiting for IO operations to complete.Multiplied by 100
system.cpu.stolensystem.cpu.utilization
Attribute Filter state: steal
The percent of time the virtual CPU spent waiting for the hypervisor to service another virtual CPU. Only applies to virtual machines. Shown as percent.Multiplied by 100
system.mem.totalsystem.memory.usageThe total amount of physical RAM in bytes.Converted to MB (divided by 2^20)
system.mem.usablesystem.memory.usage
Attributes Filter state: (free, cached, buffered)
Value of MemAvailable from /proc/meminfo if present. If not present, falls back to adding free + buffered + cached memory. In bytes.Converted to MB (divided by 2^20)
system.net.bytes_rcvdsystem.network.io
Attribute Filter direction: receive
The number of bytes received on a device per second.
system.net.bytes_sentsystem.network.io
Attribute Filter direction: transmit
The number of bytes sent from a device per second.
system.swap.freesystem.paging.usage
Attribute Filter state: free
The amount of free swap space, in bytesConverted to MB (divided by 2^20)
system.swap.usedsystem.paging.usage
Attribute Filter state: used
The amount of swap space in use, in bytes.Converted to MB (divided by 2^20)
system.disk.in_usesystem.filesystem.utilizationThe amount of disk space in use as a fraction of the total.

See OpenTelemetry Metrics Mapping for more information.

Full example configuration

For a full working example configuration with the Datadog exporter, see host-metrics.yaml.

Example logging output

ResourceMetrics #1
Resource SchemaURL: https://opentelemetry.io/schemas/1.9.0
Resource attributes:
     -> k8s.pod.ip: Str(192.168.63.232)
     -> cloud.provider: Str(aws)
     -> cloud.platform: Str(aws_ec2)
     -> cloud.region: Str(us-east-1)
     -> cloud.account.id: Str(XXXXXXXXX)
     -> cloud.availability_zone: Str(us-east-1c)
     -> host.id: Str(i-07e7d48cedbec9e86)
     -> host.image.id: Str(ami-0cbbb5a8c6f670bb6)
     -> host.type: Str(m5.large)
     -> host.name: Str(ip-192-168-49-157.ec2.internal)
     -> os.type: Str(linux)
     -> kube_app_instance: Str(opentelemetry-collector-gateway)
     -> k8s.pod.name: Str(opentelemetry-collector-gateway-688585b95-l2lds)
     -> k8s.pod.uid: Str(d8063a97-f48f-4e9e-b180-8c78a56d0a37)
     -> k8s.replicaset.uid: Str(9e2d5331-f763-43a3-b0be-9d89c0eaf0cd)
     -> k8s.replicaset.name: Str(opentelemetry-collector-gateway-688585b95)
     -> k8s.deployment.name: Str(opentelemetry-collector-gateway)
     -> kube_app_name: Str(opentelemetry-collector)
     -> k8s.namespace.name: Str(otel-ds-gateway)
     -> k8s.pod.start_time: Str(2023-11-20T12:53:08Z)
     -> k8s.node.name: Str(ip-192-168-49-157.ec2.internal)
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope otelcol/hostmetricsreceiver/memory 0.88.0-dev
Metric #0
Descriptor:
     -> Name: system.memory.usage
     -> Description: Bytes of memory in use.
     -> Unit: By
     -> DataType: Sum
     -> IsMonotonic: false
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> state: Str(used)
StartTimestamp: 2023-08-21 13:45:37 +0000 UTC
Timestamp: 2023-11-20 13:04:19.489045896 +0000 UTC
Value: 1153183744
PREVIEWING: mcretzman/DOCS-9337-add-cloud-info-byoti