Datadog discourages using DaemonSets to deploy the Datadog Agent because the manual process is prone to errors. Datadog recommends that you use Datadog Operator or Helm to install the Agent on Kubernetes.
You can use DaemonSets to deploy the Datadog Agent on all your nodes (or on specific nodes by using nodeSelectors).
To install the Datadog Agent on your Kubernetes cluster:
Configure Agent permissions: If your Kubernetes has role-based access control (RBAC) enabled, configure RBAC permissions for your Datadog Agent service account. From Kubernetes 1.6 onwards, RBAC is enabled by default. Create the appropriate ClusterRole, ServiceAccount, and ClusterRoleBinding with the following command:
Note: Those RBAC configurations are set for the default namespace. If you are in a custom namespace, update the namespace parameter before applying them.
Create the Datadog Agent manifest. Create the datadog-agent.yaml manifest out of one of the following templates:
Note: Those manifests are set for the default namespace. If you are in a custom namespace, update the metadata.namespace parameter before applying them.
In the secret-api-key.yaml manifest, replace PUT_YOUR_BASE64_ENCODED_API_KEY_HERE with your Datadog API key encoded in base64. To get the base64 version of your API key, you can run:
echo -n '<Your API key>'| base64
If you are using the datadog-agent-all-features.yaml manifest template: in the secret-cluster-agent-token.yaml manifest, replace PUT_A_BASE64_ENCODED_RANDOM_STRING_HERE with a random string encoded in base64. To get the base64 version of it, you can run:
echo -n 'Random string'| base64
Note: The random string must contain at least 32 alphanumeric characters to secure Cluster Agent to Agent communication.
Set your Datadog site to datadoghq.com using the DD_SITE environment variable in the datadog-agent.yaml manifest.
Note: If the DD_SITE environment variable is not explicitly set, it defaults to the US site datadoghq.com. If you are using one of the other sites, this results in an invalid API key message. Use the documentation site selector to see documentation appropriate for the site you’re using.
Deploy the DaemonSet with the command:
kubectl apply -f datadog-agent.yaml
Verification: To verify the Datadog Agent is running in your environment as a DaemonSet, execute:
kubectl get daemonset
If the Agent is deployed, output similar to the text below appears, where DESIRED and CURRENT are equal to the number of nodes running in your cluster.
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
datadog 22222 <none> 10s
If using Agent version 7.17 or previous, in addition to the steps above, set the DD_APM_NON_LOCAL_TRAFFIC and DD_APM_ENABLED variables to true in your env section of the datadog.yaml trace Agent manifest:
Warning: The hostPort parameter opens a port on your host. Make sure your firewall only allows access from your applications or trusted sources. If your network plugin doesn’t support hostPorts, add hostNetwork: true in your Agent pod specifications. This shares the network namespace of your host with the Datadog Agent. This also means that all ports opened on the container are opened on the host. If a port is used both on the host and in your container, they conflict (since they share the same network namespace) and the pod does not start. Some Kubernetes installations do not allow this.
To enable APM trace collection over UDS, open the DaemonSet configuration file and edit the following:
This configuration creates a directory on the host and mounts it within the Agent. The Agent then creates and listens on a socket file in that directory with the DD_APM_RECEIVER_SOCKET value of /var/run/datadog/apm.socket. The application pods can then similarly mount this volume and write to this same socket.
Note: Setting DD_CONTAINER_EXCLUDE_LOGS prevents the Datadog Agent from collecting and sending its own logs. Remove this parameter if you want to collect the Datadog Agent logs. See the environment variable for ignoring containers to learn more. When using ImageStreams inside OpenShift environments, set DD_CONTAINER_INCLUDE_LOGS with the container name to collect logs. Both of these Exclude/Include parameter value supports regular expressions.
Mount the pointerdir volume to prevent loss of container logs during restarts or network issues and /var/lib/docker/containers to collect logs through kubernetes log file as well, since /var/log/pods is symlink to this directory:
# (...)volumeMounts:# (...)- name:pointerdirmountPath:/opt/datadog-agent/run- name:logpodpathmountPath:/var/log/pods# Docker runtime directory, replace this path# with your container runtime logs directory,# or remove this configuration if `/var/log/pods`# is not a symlink to any other directory.- name:logcontainerpathmountPath:/var/lib/docker/containers# (...)volumes:# (...)- hostPath:path:/opt/datadog-agent/runname:pointerdir- hostPath:path:/var/log/podsname:logpodpath# Docker runtime directory, replace this path# with your container runtime logs directory,# or remove this configuration if `/var/log/pods`# is not a symlink to any other directory.- hostPath:path:/var/lib/docker/containersname:logcontainerpath# (...)
The pointerdir is used to store a file with a pointer to all the containers that the Agent is collecting logs from. This is to make sure none are lost when the Agent is restarted, or in the case of a network issue.
where <USER_ID> is the UID to run the agent and <DOCKER_GROUP_ID> is the group ID owning the docker or containerd socket.
When the agent is running with a non-root user, it cannot directly read the log files contained in /var/lib/docker/containers. In this case, it is necessary to mount the docker socket in the agent container so that it can fetch the container logs from the docker daemon.
Configuring leader election, as described in the above steps, ensures that only one Cluster Agent collects the events.
Alternatively, to collect the Kubernetes events from a Node Agent, set the environment variables DD_COLLECT_KUBERNETES_EVENTS and DD_LEADER_ELECTION to true in your Agent manifest.
Hostname to use for metrics (if autodetection fails)
DD_TAGS
Host tags separated by spaces. For example: simple-tag-0 tag-key-1:tag-value-1
DD_SITE
Destination site for your metrics, traces, and logs. Your DD_SITE is datadoghq.com. Defaults to datadoghq.com.
DD_DD_URL
Optional setting to override the URL for metric submission.
DD_URL (6.36+/7.36+)
Alias for DD_DD_URL. Ignored if DD_DD_URL is already set.
DD_CHECK_RUNNERS
The Agent runs all checks concurrently by default (default value = 4 runners). To run the checks sequentially, set the value to 1. If you need to run a high number of checks (or slow checks) the collector-queue component might fall behind and fail the healthcheck. You can increase the number of runners to run checks in parallel.
DD_LEADER_ELECTION
If multiple instances of the Agent are running in your cluster, set this variable to true to avoid the duplication of event collection.
Exclude containers from logs collection, metrics collection, and Autodiscovery. Datadog excludes Kubernetes and OpenShift pause containers by default. These allowlists and blocklists apply to Autodiscovery only; traces and DogStatsD are not affected. These environment variables support regular expressions in their values.
Env Variable
Description
DD_CONTAINER_INCLUDE
Allowlist of containers to include (separated by spaces). Use .* to include all. For example: "image:image_name_1 image:image_name_2", image:.*
DD_CONTAINER_EXCLUDE
Blocklist of containers to exclude (separated by spaces). Use .* to exclude all. For example: "image:image_name_3 image:image_name_4", image:.*
DD_CONTAINER_INCLUDE_METRICS
Allowlist of containers whose metrics you wish to include.
DD_CONTAINER_EXCLUDE_METRICS
Blocklist of containers whose metrics you wish to exclude.
DD_CONTAINER_INCLUDE_LOGS
Allowlist of containers whose logs you wish to include.
DD_CONTAINER_EXCLUDE_LOGS
Blocklist of containers whose logs you wish to exclude.
DD_AC_INCLUDE
Deprecated. Allowlist of containers to include (separated by spaces). Use .* to include all. For example: "image:image_name_1 image:image_name_2", image:.*
DD_AC_EXCLUDE
Deprecated. Blocklist of containers to exclude (separated by spaces). Use .* to exclude all. For example: "image:image_name_3 image:image_name_4" (Note: This variable is only honored for Autodiscovery.), image:.*
Note: The kubernetes.containers.running, kubernetes.pods.running, docker.containers.running, .stopped, .running.total and .stopped.total metrics are not affected by these settings. All containers are counted.
Additional Autodiscovery listeners to run. They are added in addition to the variables defined in the listeners section of the datadog.yaml configuration file.
DD_CONFIG_PROVIDERS
The providers the Agent should call to collect checks configurations. Available providers are: kubelet - Handles templates embedded in pod annotations. docker - Handles templates embedded in container labels. clusterchecks - Retrieves cluster-level check configurations from the Cluster Agent. kube_services - Watches Kubernetes services for cluster checks.
DD_EXTRA_CONFIG_PROVIDERS
Additional Autodiscovery configuration providers to use. They are added in addition to the variables defined in the config_providers section of the datadog.yaml configuration file.
Overrides container source auto-detection to force a single source. Example: "docker", "ecs_fargate", "kubelet". This is no longer needed since Agent v7.35.0.
DD_HEALTH_PORT
Set this to 5555 to expose the Agent health check at port 5555.
DD_CLUSTER_NAME
Set a custom Kubernetes cluster identifier to avoid host alias collisions. The cluster name can be up to 40 characters with the following restrictions: Lowercase letters, numbers, and hyphens only. Must start with a letter. Must end with a number or a letter.