This section aims to document specificities and to provide good base configurations for monitoring the Kubernetes Control Plane. You can then customize these configurations to add any Datadog feature.
With Datadog integrations for the API server, Etcd, Controller Manager, and Scheduler, you can collect key metrics from all four components of the Kubernetes Control Plane.
The following configurations are tested on Kubernetes v1.18+.
API server
The API server integration is automatically configured. The Datadog Agent discovers it automatically.
Etcd
By providing read access to the Etcd certificates located on the host, the Datadog Agent check can communicate with Etcd and start collecting Etcd metrics.
If the insecure ports of your Controller Manager and Scheduler instances are enabled, the Datadog Agent discovers the integrations and starts collecting metrics without any additional configuration.
Secure ports
Secure ports allow authentication and authorization to protect your Control Plane components. The Datadog Agent can collect Controller Manager and Scheduler metrics by targeting their secure ports.
The ssl_verify field in the kube_controller_manager and kube_scheduler configuration needs to be set to false when using self-signed certificates.
When targeting secure ports, the bind-address option in your Controller Manager and Scheduler configuration must be reachable by the Datadog Agent. Example:
On Amazon Elastic Kubernetes Service (EKS), API server metrics are exposed. This allows the Datadog Agent to obtain API server metrics using endpoint checks as described in the Kubernetes API server metrics check documentation. To configure the check, add the following annotations to the default/kubernetes service:
Ensure that you are logged in with sufficient permissions to edit services and create secrets.
API server
The API server runs behind the service kubernetes in the default namespace. Annotate this service with the kube_apiserver_metrics configuration:
oc annotate service kubernetes -n default 'ad.datadoghq.com/endpoints.check_names=["kube_apiserver_metrics"]'oc annotate service kubernetes -n default 'ad.datadoghq.com/endpoints.init_configs=[{}]'oc annotate service kubernetes -n default 'ad.datadoghq.com/endpoints.instances=[{"prometheus_url": "https://%%host%%:%%port%%/metrics", "bearer_token_auth": "true"}]'oc annotate service kubernetes -n default 'ad.datadoghq.com/endpoints.resolve=ip'
The last annotation ad.datadoghq.com/endpoints.resolve is needed because the service is in front of static pods. The Datadog Cluster Agent schedules the checks as endpoint checks and dispatches them to Cluster Check Runners. The nodes they are running on can be identified with:
Certificates are needed to communicate with the Etcd service, which can be found in the secret kube-etcd-client-certs in the openshift-monitoring namespace. To give the Datadog Agent access to these certificates, first copy them into the same namespace the Datadog Agent is running in:
oc get secret kube-etcd-client-certs -n openshift-monitoring -o yaml | sed 's/namespace: openshift-monitoring/namespace: <datadog agent namespace>/'| oc create -f -
These certificates should be mounted on the Cluster Check Runner pods by adding the volumes and volumeMounts as below.
Note: Mounts are also included to disable the Etcd check autoconfiguration file packaged with the agent.
Then, annotate the service running in front of Etcd:
oc annotate service etcd -n openshift-etcd 'ad.datadoghq.com/endpoints.check_names=["etcd"]'oc annotate service etcd -n openshift-etcd 'ad.datadoghq.com/endpoints.init_configs=[{}]'oc annotate service etcd -n openshift-etcd 'ad.datadoghq.com/endpoints.instances=[{"prometheus_url": "https://%%host%%:%%port%%/metrics", "tls_ca_cert": "/etc/etcd-certs/etcd-client-ca.crt", "tls_cert": "/etc/etcd-certs/etcd-client.crt",
"tls_private_key": "/etc/etcd-certs/etcd-client.key"}]'oc annotate service etcd -n openshift-etcd 'ad.datadoghq.com/endpoints.resolve=ip'
The Datadog Cluster Agent schedules the checks as endpoint checks and dispatches them to Cluster Check Runners.
Controller Manager
The Controller Manager runs behind the service kube-controller-manager in the openshift-kube-controller-manager namespace. Annotate the service with the check configuration:
oc annotate service kube-controller-manager -n openshift-kube-controller-manager 'ad.datadoghq.com/endpoints.check_names=["kube_controller_manager"]'oc annotate service kube-controller-manager -n openshift-kube-controller-manager 'ad.datadoghq.com/endpoints.init_configs=[{}]'oc annotate service kube-controller-manager -n openshift-kube-controller-manager 'ad.datadoghq.com/endpoints.instances=[{"prometheus_url": "https://%%host%%:%%port%%/metrics", "ssl_verify": "false", "bearer_token_auth": "true"}]'oc annotate service kube-controller-manager -n openshift-kube-controller-manager 'ad.datadoghq.com/endpoints.resolve=ip'
The Datadog Cluster Agent schedules the checks as endpoint checks and dispatches them to Cluster Check Runners.
Scheduler
The Scheduler runs behind the service scheduler in the openshift-kube-scheduler namespace. Annotate the service with the check configuration:
oc annotate service scheduler -n openshift-kube-scheduler 'ad.datadoghq.com/endpoints.check_names=["kube_scheduler"]'oc annotate service scheduler -n openshift-kube-scheduler 'ad.datadoghq.com/endpoints.init_configs=[{}]'oc annotate service scheduler -n openshift-kube-scheduler 'ad.datadoghq.com/endpoints.instances=[{"prometheus_url": "https://%%host%%:%%port%%/metrics", "ssl_verify": "false", "bearer_token_auth": "true"}]'oc annotate service scheduler -n openshift-kube-scheduler 'ad.datadoghq.com/endpoints.resolve=ip'
The Datadog Cluster Agent schedules the checks as endpoint checks and dispatches them to Cluster Check Runners.
Kubernetes on OpenShift 3
On OpenShift 3, all control plane components can be monitored using endpoint checks.
Ensure that you are logged in with sufficient permissions to create and edit services.
API server
The API server runs behind the service kubernetes in the default namespace. Annotate this service with the kube_apiserver_metrics configuration:
oc annotate service kubernetes -n default 'ad.datadoghq.com/endpoints.check_names=["kube_apiserver_metrics"]'oc annotate service kubernetes -n default 'ad.datadoghq.com/endpoints.init_configs=[{}]'oc annotate service kubernetes -n default 'ad.datadoghq.com/endpoints.instances=[{"prometheus_url": "https://%%host%%:%%port%%/metrics", "bearer_token_auth": "true"}]'oc annotate service kubernetes -n default 'ad.datadoghq.com/endpoints.resolve=ip'
The last annotation ad.datadoghq.com/endpoints.resolve is needed because the service is in front of static pods. The Datadog Cluster Agent schedules the checks as endpoint checks and dispatches them to Cluster Check Runners. The nodes they are running on can be identified with:
Certificates are needed to communicate with the Etcd service, which are located on the host. These certificates should be mounted on the Cluster Check Runner pods by adding the volumes and volumeMounts as below.
Note: Mounts are also included to disable the Etcd check autoconfiguration file packaged with the agent.
Direct edits of this service are not persisted, so make a copy of the Etcd service:
oc get service etcd -n kube-system -o yaml | sed 's/name: etcd/name: etcd-copy/'| oc create -f -
Annotate the copied service with the check configuration:
oc annotate service etcd-copy -n openshift-etcd 'ad.datadoghq.com/endpoints.check_names=["etcd"]'oc annotate service etcd-copy -n openshift-etcd 'ad.datadoghq.com/endpoints.init_configs=[{}]'oc annotate service etcd-copy -n openshift-etcd 'ad.datadoghq.com/endpoints.instances=[{"prometheus_url": "https://%%host%%:%%port%%/metrics", "tls_ca_cert": "/host/etc/etcd/ca/ca.crt", "tls_cert": "/host/etc/etcd/server.crt",
"tls_private_key": "/host/etc/etcd/server.key"}]'oc annotate service etcd-copy -n openshift-etcd 'ad.datadoghq.com/endpoints.resolve=ip'
The Datadog Cluster Agent schedules the checks as endpoint checks and dispatches them to Cluster Check Runners.
Controller Manager and Scheduler
The Controller Manager and Scheduler run behind the same service, kube-controllers in the kube-system namespace. Direct edits of the service are not persisted, so make a copy of the service:
oc get service kube-controllers -n kube-system -o yaml | sed 's/name: kube-controllers/name: kube-controllers-copy/'| oc create -f -
Annotate the copied service with the check configurations:
The Datadog Cluster Agent schedules the checks as endpoint checks and dispatches them to Cluster Check Runners.
Kubernetes on Rancher Kubernetes Engine (v2.5+)
Rancher v2.5 relies on PushProx to expose control plane metric endpoints, this allows the Datadog Agent to run control plane checks and collect metrics.
The control plane components run on Docker outside of Kubernetes. Within Kubernetes, the kubernetes service in the default namespace targets the control plane node IP(s). You can confirm this by running $ kubectl describe endpoints kubernetes.
You can annotate this service with endpoint checks (managed by the Datadog Cluster Agent) to monitor the API Server, Controller Manager, and Scheduler:
Etcd is run in Docker outside of Kubernetes, and certificates are required to communicate with the Etcd service. The suggested steps to set up Etcd monitoring require SSH access to a control plane node running Etcd.
SSH into the control plane node by following the Rancher documentation. Confirm that Etcd is running in a Docker container with $ docker ps, and then use $ docker inspect etcd to find the location of the certificates used in the run command ("Cmd"), as well as the host path of the mounts.
The three flags in the command to look for are:
--trusted-ca-file
--cert-file
--key-file
Using the mount information available in the $ docker inspect etcd output, set volumes and volumeMounts in the Datadog Agent configuration. Also include tolerations so that the Datadog Agent can run on the control plane nodes.
The following are examples of how to configure the Datadog Agent with Helm and the Datadog Operator:
Set up a DaemonSet with a pause container to run the Etcd check on the nodes running Etcd. This DaemonSet runs on the host network so that it can access the Etcd service. It also has the check configuration and the tolerations needed to run on the control plane node(s). Make sure that the mounted certificate file paths match what you set up on your instance, and replace the <...> portion accordingly.
To deploy the DaemonSet and the check configuration, run
kubectl apply -f <filename>
Kubernetes on managed services (AKS, GKE)
On other managed services, such as Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE), the user cannot access the control plane components. As a result, it is not possible to run the kube_apiserver, kube_controller_manager, kube_scheduler, or etcd checks in these environments.