Docker Daemon

Supported OS Linux Mac OS

Note: The Docker Daemon check is still maintained but only works with Agent v5.

To use the Docker integration with Agent v6 consult the Agent v6 section below.

Docker default dashboard

Overview

Configure this Agent check to get metrics from the Docker_daemon service in real time to:

  • Visualize and monitor Docker_daemon states.
  • Be notified about Docker_daemon failovers and events.

Setup

Installation

To collect Docker metrics about all your containers, run one Datadog Agent on every host. There are two ways to run the Agent: directly on each host, or within a docker-dd-agent container (recommended).

For either option, your hosts need cgroup memory management enabled for the Docker check to succeed. See the docker-dd-agent repository for how to enable it.

Host installation

  1. Ensure Docker is running on the host.
  2. Install the Agent as described in the Agent installation instructions for your host OS.
  3. Enable the Docker integration tile in the application.
  4. Add the Agent user to the Docker group: usermod -a -G docker dd-agent
  5. Create a docker_daemon.yaml file by copying the example file in the Agent conf.d directory. If you have a standard install of Docker on your host, there shouldn’t be anything you need to change to get the integration to work.
  6. To enable other integrations, use docker ps to identify the ports used by the corresponding applications. Docker ps command

Container installation

  1. Ensure Docker is running on the host.

  2. As per the Docker container installation instructions, run:

     docker run -d --name dd-agent \
       -v /var/run/docker.sock:/var/run/docker.sock:ro \
       -v /proc/:/host/proc/:ro \
       -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
       -e API_KEY={YOUR_DD_API_KEY} \
       datadog/docker-dd-agent:latest
    

In the command above, you are able to pass your API key to the Datadog Agent using Docker’s -e environment variable flag. Other variables include:

VariableDescription
API_KEYSets your Datadog API key.
DD_HOSTNAMESets the hostname in the Agent container’s datadog.conf file. If this variable is not set, the Agent container defaults to using the Name field (as reported by the docker info command) as the Agent container hostname.
DD_URLSets the Datadog intake server URL where the Agent sends data. This is useful when using the Agent as a proxy.
LOG_LEVELSets logging verbosity (CRITICAL, ERROR, WARNING, INFO, DEBUG). For example, -e LOG_LEVEL=DEBUG sets logging to debug mode.
TAGSSets host tags as a comma delimited string. Both simple tags and key-value tags are available, for example: -e TAGS="simple-tag, tag-key:tag-value".
EC2_TAGSEnabling this feature allows the Agent to query and capture custom tags set using the EC2 API during startup. To enable, use -e EC2_TAGS=yes. Note: This feature requires an IAM role associated with the instance.
NON_LOCAL_TRAFFICEnabling this feature allows StatsD reporting from any external IP. To enable, use -e NON_LOCAL_TRAFFIC=yes. This is used to report metrics from other containers or systems. See network configuration for more details.
PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASSWORDSets proxy configuration details. Note: PROXY_PASSWORD is required for passing in an authentication password and cannot be renamed. For more information, see the Agent proxy documentation.
SD_BACKEND, SD_CONFIG_BACKEND, SD_BACKEND_HOST, SD_BACKEND_PORT, SD_TEMPLATE_DIR, SD_CONSUL_TOKENEnables and configures Autodiscovery. For more information, see the Autodiscovery guide.

Note: Add --restart=unless-stopped if you want your agent to be resistant to restarts.

Running the Agent container on Amazon Linux

To run the Datadog Agent container on Amazon Linux, make this change to the cgroup volume mount location:

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={YOUR API KEY} \
  datadog/docker-dd-agent:latest

Alpine Linux based container

The standard Docker image is based on Debian Linux, but as of Datadog Agent v5.7, there is an Alpine Linux based image. The Alpine Linux image is considerably smaller in size than the traditional Debian-based image. It also inherits Alpine’s security-oriented design.

To use the Alpine Linux image, append -alpine to the version tag. For example:

docker run -d --name dd-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -e API_KEY={YOUR API KEY} \
  datadog/docker-dd-agent:latest-alpine

Image versioning

Starting with version 5.5.0 of the Datadog Agent, the Docker image follows a new versioning pattern. This allows Datadog to release changes to the Docker image of the Datadog Agent but with the same version of the Agent.

The Docker image version has the following pattern: X.Y.Z where X is the major version of the Docker image, Y is the minor version, Z represents the Agent version.

For example, the first version of the Docker image that bundles the Datadog Agent 5.5.0 is: 10.0.550

Custom containers and additional information

For more information about building custom Docker containers with the Datadog Agent, the Alpine Linux based image, versioning, and more, reference the docker-dd-agent project on Github.

Validation

Run the Agent’s status subcommand and look for docker_daemon under the Checks section.

Agent v6

The latest Docker check is named docker and written in Go to take advantage of the new internal architecture. Starting from v6.0, the Agent doesn’t load the docker_daemon check anymore, even if it is still available and maintained for Agent v5. All features are ported on version >6.0 , except the following deprecations:

  • The url, api_version and tags* options are deprecated. Direct use of the standard Docker environment variables is encouraged.
  • The ecs_tags, performance_tags and container_tags options are deprecated. Every relevant tag is collected by default.
  • The collect_container_count option to enable the docker.container.count metric is not supported. docker.containers.running and .stopped should be used.

Some options have moved from docker_daemon.yaml to the main datadog.yaml:

  • collect_labels_as_tags has been renamed docker_labels_as_tags and supports high cardinality tags. See the details in datadog.yaml.example.
  • exclude and include lists have been renamed ac_include and ac_exclude. To make filtering consistent across all components of the Agent, filtering on arbitrary tags has been dropped. The only supported filtering tags are image (image name) and name (container name). Regexp filtering is still available, see datadog.yaml.example for examples.
  • The docker_root option has been split in two options: container_cgroup_root and container_proc_root.
  • exclude_pause_container has been added to exclude paused containers on Kubernetes and Openshift (defaults to true). This avoids removing them from the exclude list by error.

Additional changes:

The import command converts the old docker_daemon.yaml to the new docker.yaml. The command also moves needed settings from docker_daemon.yaml to datadog.yaml.

Data Collected

Metrics

docker.container.open_fds
(gauge)
The number of open file descriptors
Shown as file
docker.container.size_rootfs
(gauge)
Total size of all the files in the container
Shown as byte
docker.container.size_rootfs.95percentile
(gauge)
95th percentile of docker.container.size_rootfs
Shown as byte
docker.container.size_rootfs.avg
(gauge)
Average value of docker.container.size_rootfs
Shown as byte
docker.container.size_rootfs.count
(rate)
The rate that the value of docker.container.size_rw was sampled
Shown as sample
docker.container.size_rootfs.max
(gauge)
Max value of docker.container.size_rootfs
Shown as byte
docker.container.size_rootfs.median
(gauge)
Median value of docker.container.size_rootfs
Shown as byte
docker.container.size_rw
(gauge)
Total size of all the files in the container which have been created or changed by processes running in the container
Shown as byte
docker.container.size_rw.95percentile
(gauge)
95th percentile of docker.container.size_rw
Shown as byte
docker.container.size_rw.avg
(gauge)
Average value of docker.container.size_rw
Shown as byte
docker.container.size_rw.count
(rate)
The rate that the value of docker.container.size_rw was sampled
Shown as sample
docker.container.size_rw.max
(gauge)
Max value of docker.container.size_rw
Shown as byte
docker.container.size_rw.median
(gauge)
Median value of docker.container.size_rw
Shown as byte
docker.containers.running
(gauge)
The number of containers running on this host tagged by image
docker.containers.running.total
(gauge)
The total number of containers running on this host
docker.containers.stopped
(gauge)
The number of containers stopped on this host tagged by image
docker.containers.stopped.total
(gauge)
The total number of containers stopped on this host
docker.cpu.limit
(gauge)
Limit on CPU available to the container, expressed as percentage of a core
Shown as percent
docker.cpu.shares
(gauge)
Shares of CPU usage allocated to the container
docker.cpu.system
(gauge)
The percent of time the CPU is executing system calls on behalf of processes of this container, unnormalized
Shown as percent
docker.cpu.system.95percentile
(gauge)
95th percentile of docker.cpu.system [deprecated in agent 6.0]
Shown as percent
docker.cpu.system.avg
(gauge)
Average value of docker.cpu.system [deprecated in agent 6.0]
Shown as percent
docker.cpu.system.count
(rate)
The rate that the value of docker.cpu.system was sampled [deprecated in agent 6.0]
Shown as sample
docker.cpu.system.max
(gauge)
Max value of docker.cpu.system
Shown as percent
docker.cpu.system.median
(gauge)
Median value of docker.cpu.system [deprecated in agent 6.0]
Shown as percent
docker.cpu.throttled
(gauge)
Number of times the cgroup has been throttled
docker.cpu.usage
(gauge)
The percent of CPU time obtained by this container
Shown as percent
docker.cpu.user
(gauge)
The percent of time the CPU is under direct control of processes of this container, unnormalized
Shown as percent
docker.cpu.user.95percentile
(gauge)
95th percentile of docker.cpu.user [deprecated in agent 6.0]
Shown as percent
docker.cpu.user.avg
(gauge)
Average value of docker.cpu.user [deprecated in agent 6.0]
Shown as percent
docker.cpu.user.count
(rate)
The rate that the value of docker.cpu.user was sampled [deprecated in agent 6.0]
Shown as sample
docker.cpu.user.max
(gauge)
Max value of docker.cpu.user [deprecated in agent 6.0]
Shown as percent
docker.cpu.user.median
(gauge)
Median value of docker.cpu.user [deprecated in agent 6.0]
Shown as percent
docker.data.free
(gauge)
Storage pool disk space free
Shown as byte
docker.data.percent
(gauge)
The percent of storage pool used
Shown as percent
docker.data.total
(gauge)
Storage pool disk space total
Shown as byte
docker.data.used
(gauge)
Storage pool disk space used
Shown as byte
docker.image.size
(gauge)
Size of all layers of the image on disk
Shown as byte
docker.image.virtual_size
(gauge)
Size of all layers of the image on disk
Shown as byte
docker.images.available
(gauge)
The number of top-level images
docker.images.intermediate
(gauge)
The number of intermediate images, which are intermediate layers that make up other images
docker.io.read_bytes
(gauge)
Bytes read per second from disk by the processes of the container
Shown as byte
docker.io.read_bytes.95percentile
(gauge)
95th percentile of docker.io.read_bytes [deprecated in agent 6.0]
Shown as byte
docker.io.read_bytes.avg
(gauge)
Average value of docker.io.read_bytes [deprecated in agent 6.0]
Shown as byte
docker.io.read_bytes.count
(rate)
The rate that the value of docker.io.read_bytes was sampled [deprecated in agent 6.0]
Shown as sample
docker.io.read_bytes.max
(gauge)
Max value of docker.container.io.read_bytes [deprecated in agent 6.0]
Shown as byte
docker.io.read_bytes.median
(gauge)
Median value of docker.container.io.read_bytes [deprecated in agent 6.0]
Shown as byte
docker.io.write_bytes
(gauge)
Bytes written per second to disk by the processes of the container
Shown as byte
docker.io.write_bytes.95percentile
(gauge)
95th percentile of docker.io.write_bytes [deprecated in agent 6.0]
Shown as byte
docker.io.write_bytes.avg
(gauge)
Average value of docker.io.write_bytes [deprecated in agent 6.0]
Shown as byte
docker.io.write_bytes.count
(rate)
The rate that the value of docker.io.write_bytes was sampled [deprecated in agent 6.0]
Shown as sample
docker.io.write_bytes.max
(gauge)
Max value of docker.container.io.write_bytes [deprecated in agent 6.0]
Shown as byte
docker.io.write_bytes.median
(gauge)
Median value of docker.container.io.write_bytes [deprecated in agent 6.0]
Shown as byte
docker.kmem.usage
(gauge)
The amount of kernel memory that belongs to the container's processes.
Shown as byte
docker.mem.cache
(gauge)
The amount of memory that is being used to cache data from disk (e.g. memory contents that can be associated precisely with a block on a block device)
Shown as byte
docker.mem.cache.95percentile
(gauge)
95th percentile value of docker.mem.cache [deprecated in agent 6.0]
Shown as byte
docker.mem.cache.avg
(gauge)
Average value of docker.mem.cache [deprecated in agent 6.0]
Shown as byte
docker.mem.cache.count
(rate)
The rate that the value of docker.mem.cache was sampled [deprecated in agent 6.0]
Shown as sample
docker.mem.cache.max
(gauge)
Max value of docker.mem.cache [deprecated in agent 6.0]
Shown as byte
docker.mem.cache.median
(gauge)
Median value of docker.mem.cache [deprecated in agent 6.0]
Shown as byte
docker.mem.in_use
(gauge)
The fraction of used memory to available memory, IF THE LIMIT IS SET
Shown as fraction
docker.mem.in_use.95percentile
(gauge)
95th percentile of docker.mem.in_use [deprecated in agent 6.0]
Shown as fraction
docker.mem.in_use.avg
(gauge)
Average value of docker.mem.in_use [deprecated in agent 6.0]
Shown as fraction
docker.mem.in_use.count
(rate)
The rate that the value of docker.mem.in_use was sampled [deprecated in agent 6.0]
Shown as sample
docker.mem.in_use.max
(gauge)
Max value of docker.container.mem.in_use [deprecated in agent 6.0]
Shown as fraction
docker.mem.in_use.median
(gauge)
Median value of docker.container.mem.in_use [deprecated in agent 6.0]
Shown as fraction
docker.mem.limit
(gauge)
The memory limit for the container, if set
Shown as byte
docker.mem.limit.95percentile
(gauge)
95th percentile of docker.mem.limit. Ordinarily this value will not change [deprecated in agent 6.0]
Shown as byte
docker.mem.limit.avg
(gauge)
Average value of docker.mem.limit. Ordinarily this value will not change [deprecated in agent 6.0]
Shown as byte
docker.mem.limit.count
(rate)
The rate that the value of docker.mem.limit was sampled [deprecated in agent 6.0]
Shown as sample
docker.mem.limit.max
(gauge)
Max value of docker.mem.limit. Ordinarily this value will not change [deprecated in agent 6.0]
Shown as byte
docker.mem.limit.median
(gauge)
Median value of docker.mem.limit. Ordinarily this value will not change [deprecated in agent 6.0]
Shown as byte
docker.mem.rss
(gauge)
The amount of non-cache memory that belongs to the container's processes. Used for stacks, heaps, etc.
Shown as byte
docker.mem.rss.95percentile
(gauge)
95th percentile value of docker.mem.rss [deprecated in agent 6.0]
Shown as byte
docker.mem.rss.avg
(gauge)
Average value of docker.mem.rss [deprecated in agent 6.0]
Shown as byte
docker.mem.rss.count
(rate)
The rate that the value of docker.mem.rss was sampled [deprecated in agent 6.0]
Shown as sample
docker.mem.rss.max
(gauge)
Max value of docker.mem.rss [deprecated in agent 6.0]
Shown as byte
docker.mem.rss.median
(gauge)
Median value of docker.mem.rss [deprecated in agent 6.0]
Shown as byte
docker.mem.soft_limit
(gauge)
The memory reservation limit for the container, if set
Shown as byte
docker.mem.soft_limit.95percentile
(gauge)
95th percentile of docker.mem.soft_limit. Ordinarily this value will not change
Shown as byte
docker.mem.soft_limit.avg
(gauge)
Average value of docker.mem.soft_limit. Ordinarily this value will not change
Shown as byte
docker.mem.soft_limit.count
(rate)
The rate that the value of docker.mem.soft_limit was sampled
Shown as sample
docker.mem.soft_limit.max
(gauge)
Max value of docker.mem.soft_limit. Ordinarily this value will not change
Shown as byte
docker.mem.soft_limit.median
(gauge)
Median value of docker.mem.soft_limit. Ordinarily this value will not change
Shown as byte
docker.mem.sw_in_use
(gauge)
The fraction of used swap + memory to available swap + memory, if the limit is set
Shown as fraction
docker.mem.sw_in_use.95percentile
(gauge)
95th percentile of docker.mem.swinuse [deprecated in agent 6.0]
Shown as fraction
docker.mem.sw_in_use.avg
(gauge)
Average value of docker.mem.swinuse [deprecated in agent 6.0]
Shown as fraction
docker.mem.sw_in_use.count
(rate)
The rate that the value of docker.mem.swinuse was sampled [deprecated in agent 6.0]
Shown as sample
docker.mem.sw_in_use.max
(gauge)
Max value of docker.container.mem.swinuse [deprecated in agent 6.0]
Shown as fraction
docker.mem.sw_in_use.median
(gauge)
Median value of docker.container.mem.swinuse [deprecated in agent 6.0]
Shown as fraction
docker.mem.sw_limit
(gauge)
The swap + memory limit for the container, if set
Shown as byte
docker.mem.sw_limit.95percentile
(gauge)
95th percentile of docker.mem.sw_limit. Ordinarily this value will not change [deprecated in agent 6.0]
Shown as byte
docker.mem.sw_limit.avg
(gauge)
Average value of docker.mem.sw_limit. Ordinarily this value will not change [deprecated in agent 6.0]
Shown as byte
docker.mem.sw_limit.count
(rate)
The rate that the value of docker.mem.sw_limit was sampled [deprecated in agent 6.0]
Shown as sample
docker.mem.sw_limit.max
(gauge)
Max value of docker.mem.sw_limit. Ordinarily this value will not change [deprecated in agent 6.0]
Shown as byte
docker.mem.sw_limit.median
(gauge)
Median value of docker.mem.sw_limit. Ordinarily this value will not change [deprecated in agent 6.0]
Shown as byte
docker.mem.swap
(gauge)
The amount of swap currently used by the container
Shown as byte
docker.mem.swap.95percentile
(gauge)
95th percentile value of docker.mem.swap [deprecated in agent 6.0]
Shown as byte
docker.mem.swap.avg
(gauge)
Average value of docker.mem.swap [deprecated in agent 6.0]
Shown as byte
docker.mem.swap.count
(rate)
The rate that the value of docker.mem.swap was sampled [deprecated in agent 6.0]
Shown as sample
docker.mem.swap.max
(gauge)
Max value of docker.mem.swap [deprecated in agent 6.0]
Shown as byte
docker.mem.swap.median
(gauge)
Median value of docker.mem.swap [deprecated in agent 6.0]
Shown as byte
docker.metadata.free
(gauge)
Storage pool metadata space free
Shown as byte
docker.metadata.percent
(gauge)
The percent of storage pool metadata used
Shown as percent
docker.metadata.total
(gauge)
Storage pool metadata space total
Shown as byte
docker.metadata.used
(gauge)
Storage pool metadata space used
Shown as byte
docker.net.bytes_rcvd
(gauge)
Bytes received per second from the network
Shown as byte
docker.net.bytes_rcvd.95percentile
(gauge)
95th percentile of docker.net.bytes_rcvd [deprecated in agent 6.0]
Shown as byte
docker.net.bytes_rcvd.avg
(gauge)
Average value of docker.net.bytes_rcvd [deprecated in agent 6.0]
Shown as byte
docker.net.bytes_rcvd.count
(rate)
The rate that the value of docker.net.bytes_rcvd was sampled [deprecated in agent 6.0]
Shown as sample
docker.net.bytes_rcvd.max
(gauge)
Max value of docker.container.net.bytes_rcvd [deprecated in agent 6.0]
Shown as byte
docker.net.bytes_rcvd.median
(gauge)
Median value of docker.container.net.bytes_rcvd [deprecated in agent 6.0]
Shown as byte
docker.net.bytes_sent
(gauge)
Bytes sent per second to the network
Shown as byte
docker.net.bytes_sent_bytes.95percentile
(gauge)
95th percentile of docker.net.bytessentbytes [deprecated in agent 6.0]
Shown as byte
docker.net.bytes_sent_bytes.avg
(gauge)
Average value of docker.net.bytessentbytes [deprecated in agent 6.0]
Shown as byte
docker.net.bytes_sent_bytes.count
(rate)
The rate that the value of docker.net.bytessentbytes was sampled [deprecated in agent 6.0]
Shown as sample
docker.net.bytes_sent_bytes.max
(gauge)
Max value of docker.container.net.bytessentbytes [deprecated in agent 6.0]
Shown as byte
docker.net.bytes_sent_bytes.median
(gauge)
Median value of docker.container.net.bytessentbytes [deprecated in agent 6.0]
Shown as byte
docker.thread.count
(gauge)
Current thread count for the container
Shown as thread
docker.thread.limit
(gauge)
Thread count limit for the container, if set
Shown as thread
docker.uptime
(gauge)
Time since the container was started
Shown as second

Events

The Docker integration produces the following events:

  • Delete Image
  • Die
  • Error
  • Fail
  • Kill
  • Out of memory (oom)
  • Pause
  • Restart container
  • Restart Daemon
  • Update

Service Checks

docker.service_up
Returns CRITICAL if the Agent is unable to collect the list of containers from the Docker daemon. Returns OK otherwise.
Statuses: ok, critical

docker.container_health
Returns CRITICAL if a container is unhealthy. Returns OK otherwise or UNKNOWN if the health is unknown.
Statuses: ok, critical, unknown

docker.exit
Returns CRITICAL if a container exited with a non-zero exit code. Returns OK otherwise.
Statuses: ok, critical

Note: To use docker.exit, add collect_exit_codes: true in your Docker YAML file and restart the Agent.

Troubleshooting

Need help? Contact Datadog support.

Further Reading

PREVIEWING: hannahkm/clarify-v2-docs