When you are monitoring a containerized infrastructure, one challenge that arises is that containers can shift from host to host. The dynamic nature of containerized systems makes them difficult to manually monitor.
To solve this issue, you can use Datadog’s Autodiscovery feature to automatically identify the services running on a specific container and gather data from those services. Whenever a container starts, the Datadog Agent identifies which services are running on this new container, looks for the corresponding monitoring configuration, and starts to collect metrics.
Autodiscovery lets you define configuration templates for Agent checks and specify which containers each checks should apply to.
The Agent watches for events like container creation, destruction, starts, and stops. The Agent then enables, disables, and regenerates static check configurations on such events. As the Agent inspects each running container, it checks if the container matches any of the Autodiscovery container identifiers from any loaded templates. For each match, the Agent generates a static check configuration by substituting the Template Variables with the matching container’s specific values. Then it enables the check using the static configuration.
In the figure above, there is a host node with three pods, including a Redis pod and an Agent pod. The Kubelet, which schedules containers, runs as a binary on this node, and exposes the endpoints /metrics and /pods. Every 10 seconds, the Agent queries /pods and finds the Redis spec. It can also see information about the Redis pod itself.
The Redis spec in this example includes the following annotations:
In the example above, the tags.datadoghq.com labels set the env, service, and even version as tags for all logs and metrics emitted for the Redis pod. These standard labels are part of Unified Service Tagging. As a best practice, Datadog recommends using unified service tagging when configuring tags and environment variables.
redisdb is the name of the check to run. init_config contains some configuration parameters, such as minimum collection interval, and is optional. Each item in instances represents the configuration to run for one instance of a check. Note: In this example, %%host%% is a template variable that is dynamically populated with your container’s IP.
In the example above, the tags.datadoghq.com labels set the env, service, and even version as tags for all logs and metrics emitted for the Redis pod. These standard labels are part of Unified Service Tagging. As a best practice, Datadog recommends using unified service tagging when configuring tags and environment variables.
check_names includes the names of the check to run, and init_configs contains some configuration parameters, such as minimum collection interval. Each item in instances represents the configuration to run for one instance of a check. Note: In this example, %%host%% is a template variable that is dynamically populated with your container’s IP.
From this, the Agent generates a static check configuration.
The Agent not only automatically detects reachable sockets and API endpoints (such as Docker, containerd, and Kubernetes API), but also activates Autodiscovery for you.
If Autodiscovery is not working, verify the detected features by running agent status.
In case the automatic detection failed or you want to deactivate automatically detected features, use these configuration parameters in datadog.yaml to include/exclude features:
Once Autodiscovery is enabled, the Datadog Agent automatically attempts Autodiscovery for several services, including Apache and Redis, based on default Autodiscovery configuration files.
You can define an integration template in multiple forms: as Kubernetes pod annotations, Docker labels, a configuration file mounted within the Agent, a ConfigMap, and key-value stores. See the Autodiscovery Integration Templates documentation for further details.
If you are using Autodiscovery and an application is deployed on a new node, you may experience some delay in seeing metrics appear in Datadog. When you switch to a new node, it takes time for the Datadog Agent to collect metadata from your application.