- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
Get metrics from Kubernetes service in real-time to:
The Kubernetes State Metrics Core check leverages kube-state-metrics version 2+ and includes major performance and tagging improvements compared to the legacy kubernetes_state
check.
As opposed to the legacy check, with the Kubernetes State Metrics Core check, you no longer need to deploy kube-state-metrics
in your cluster.
Kubernetes State Metrics Core provides a better alternative to the legacy kubernetes_state
check as it offers more granular metrics and tags. See the Major Changes and Data Collected for more details.
The Kubernetes State Metrics Core check is included in the Datadog Cluster Agent image, so you don’t need to install anything else on your Kubernetes servers.
In your Helm values.yaml
, add the following:
datadog:
# (...)
kubeStateMetricsCore:
enabled: true
To enable the kubernetes_state_core
check, the setting spec.features.kubeStateMetricsCore.enabled
must be set to true
in the DatadogAgent resource:
kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
name: datadog
spec:
global:
credentials:
apiKey: <DATADOG_API_KEY>
features:
kubeStateMetricsCore:
enabled: true
Note: Datadog Operator v0.7.0 or greater is required.
In the original kubernetes_state
check, several tags have been flagged as deprecated and replaced by new tags. To determine your migration path, check which tags are submitted with your metrics.
In the kubernetes_state_core
check, only the non-deprecated tags are submitted. Before migrating from kubernetes_state
to kubernetes_state_core
, verify that only official tags are used in monitors and dashboards.
Here is the mapping between deprecated tags and the official tags that have replaced them:
deprecated tag | official tag |
---|---|
cluster_name | kube_cluster_name |
container | kube_container_name |
cronjob | kube_cronjob |
daemonset | kube_daemon_set |
deployment | kube_deployment |
hpa | horizontalpodautoscaler |
image | image_name |
job | kube_job |
job_name | kube_job |
namespace | kube_namespace |
phase | pod_phase |
pod | pod_name |
replicaset | kube_replica_set |
replicationcontroller | kube_replication_controller |
statefulset | kube_stateful_set |
The Kubernetes State Metrics Core check is not backward compatible, be sure to read the changes carefully before migrating from the legacy kubernetes_state
check.
kubernetes_state.node.by_condition
kubernetes_state.nodes.by_condition
is deprecated in favor of this one. Note: This metric is backported into the legacy check, where both metrics (it and the legacy metric it replaces) are available.kubernetes_state.persistentvolume.by_phase
kubernetes_state.persistentvolumes.by_phase
.kubernetes_state.pod.status_phase
pod_name
.kubernetes_state.node.count
host
anymore. It aggregates the nodes count by kernel_version
os_image
container_runtime_version
kubelet_version
.kubernetes_state.container.waiting
and kubernetes_state.container.status_report.count.waiting
kube_job
kubernetes_state
, the kube_job
tag value is the CronJob
name if the Job
had CronJob
as an owner, otherwise it is the Job
name. In kubernetes_state_core
, the kube_job
tag value is always the Job
name, and a new kube_cronjob
tag key is added with the CronJob
name as the tag value. When migrating to kubernetes_state_core
, it’s recommended to use the new tag or kube_job:foo*
, where foo
is the CronJob
name, for query filters.kubernetes_state.job.succeeded
kubernetes_state
, the kuberenetes.job.succeeded
was count
type. In kubernetes_state_core
it is gauge
type.Host or node-level tags no longer appear on cluster-centric metrics. Only metrics relative to an actual node in the cluster, like kubernetes_state.node.by_condition
or kubernetes_state.container.restarts
, continue to inherit their respective host or node level tags.
To add tags globally, use the DD_TAGS
environment variable, or use the respective Helm or Operator configurations. Instance-only level tags can be specified by mounting a custom kubernetes_state_core.yaml
into the Cluster Agent.
datadog:
kubeStateMetricsCore:
enabled: true
tags:
- "<TAG_KEY>:<TAG_VALUE>"
kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
name: datadog
spec:
global:
credentials:
apiKey: <DATADOG_API_KEY>
tags:
- "<TAG_KEY>:<TAG_VALUE>"
features:
kubeStateMetricsCore:
enabled: true
Metrics like kubernetes_state.container.memory_limit.total
or kubernetes_state.node.count
are aggregate counts of groups within a cluster, and host or node-level tags are not added.
Enabling kubeStateMetricsCore
in your Helm values.yaml
configures the Agent to ignore the auto configuration file for legacy kubernetes_state
check. The goal is to avoid running both checks simultaneously.
If you still want to enable both checks simultaneously for the migration phase, disable the ignoreLegacyKSMCheck
field in your values.yaml
.
Note: ignoreLegacyKSMCheck
makes the Agent only ignore the auto configuration for the legacy kubernetes_state
check. Custom kubernetes_state
configurations need to be removed manually.
The Kubernetes State Metrics Core check does not require deploying kube-state-metrics
in your cluster anymore, you can disable deploying kube-state-metrics
as part of the Datadog Helm Chart. To do this, add the following in your Helm values.yaml
:
datadog:
# (...)
kubeStateMetricsEnabled: false
Important Note: The Kubernetes State Metrics Core check is an alternative to the legacy kubernetes_state
check. Datadog recommends not enabling both checks simultaneously to guarantee consistent metrics.
kubernetes_state.apiservice.condition (gauge) | The current condition of this apiservice. Tags:kube_namespace apiservice condition status . |
kubernetes_state.apiservice.count (gauge) | The current count of apiservices. |
kubernetes_state.configmap.count (gauge) | Number of ConfigMaps. Requires ConfigMaps to be added to Cluster Agent collector. Tags: kube_namespace . |
kubernetes_state.container.cpu_limit (gauge) | The value of CPU limit by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).Shown as cpu |
kubernetes_state.container.cpu_limit.total (gauge) | The total value of CPU limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind> .Shown as cpu |
kubernetes_state.container.cpu_requested (gauge) | The value of CPU requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).Shown as cpu |
kubernetes_state.container.cpu_requested.total (gauge) | The total value of CPU requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind> .Shown as cpu |
kubernetes_state.container.gpu_limit (gauge) | The value of GPU limit by a container. Tags:kube_namespace pod_name kube_container_name node resource mig_profile unit (env service version from standard labels). |
kubernetes_state.container.gpu_limit.total (gauge) | The total value of GPU limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind> . |
kubernetes_state.container.gpu_requested (gauge) | The value of GPU requested by a container. Tags:kube_namespace pod_name kube_container_name node resource mig_profile unit (env service version from standard labels). |
kubernetes_state.container.gpu_requested.total (gauge) | The total value of GPU requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind> . |
kubernetes_state.container.memory_limit (gauge) | The value of memory limit by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).Shown as byte |
kubernetes_state.container.memory_limit.total (gauge) | The total value of memory limits by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind> .Shown as byte |
kubernetes_state.container.memory_requested (gauge) | The value of memory requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels).Shown as byte |
kubernetes_state.container.memory_requested.total (gauge) | The total value of memory requested by all containers in the cluster. Tags:kube_namespace kube_container_name kube_<owner kind> .Shown as byte |
kubernetes_state.container.network_bandwidth_limit (gauge) | The value of network bandwidth limit for a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels). |
kubernetes_state.container.network_bandwidth_requested (gauge) | The value of network bandwidth requested by a container. Tags:kube_namespace pod_name kube_container_name node resource unit (env service version from standard labels). |
kubernetes_state.container.ready (gauge) | Describes whether the containers readiness check succeeded. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels). |
kubernetes_state.container.restarts (gauge) | The number of container restarts per container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels). |
kubernetes_state.container.running (gauge) | Describes whether the container is currently in running state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels). |
kubernetes_state.container.status_report.count.terminated (gauge) | Describes the reason the container is currently in terminated state. Tags:kube_namespace pod_name kube_container_name reason (env service version from standard labels). |
kubernetes_state.container.status_report.count.waiting (gauge) | Describes the reason the container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name reason (env service version from standard labels). |
kubernetes_state.container.terminated (gauge) | Describes whether the container is currently in terminated state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels). |
kubernetes_state.container.waiting (gauge) | Describes whether the container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels). |
kubernetes_state.crd.condition (gauge) | The current condition of this custom resource definition. Tags: customresourcedefinition condition status . |
kubernetes_state.crd.count (gauge) | Number of custom resource definitions. |
kubernetes_state.cronjob.count (gauge) | Number of cronjobs. Tags:kube_namespace . |
kubernetes_state.cronjob.duration_since_last_schedule (gauge) | The duration since the last time the cronjob was scheduled. Tags:kube_cronjob kube_namespace (env service version from standard labels). |
kubernetes_state.cronjob.spec_suspend (gauge) | Suspend flag tells the controller to suspend subsequent executions. Tags:kube_namespace kube_cronjob (env service version from standard labels). |
kubernetes_state.daemonset.count (gauge) | Number of DaemonSets. Tags:kube_namespace . |
kubernetes_state.daemonset.daemons_available (gauge) | The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available. Tags:kube_daemon_set kube_namespace (env service version from standard labels). |
kubernetes_state.daemonset.daemons_unavailable (gauge) | The number of nodes that should be running the daemon pod and have none of the daemon pod running and available. Tags:kube_daemon_set kube_namespace (env service version from standard labels). |
kubernetes_state.daemonset.desired (gauge) | The number of nodes that should be running the daemon pod. Tags:kube_daemon_set kube_namespace (env service version from standard labels). |
kubernetes_state.daemonset.misscheduled (gauge) | The number of nodes running a daemon pod but are not supposed to. Tags:kube_daemon_set kube_namespace (env service version from standard labels). |
kubernetes_state.daemonset.ready (gauge) | The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready. Tags:kube_daemon_set kube_namespace (env service version from standard labels). |
kubernetes_state.daemonset.scheduled (gauge) | The number of nodes running at least one daemon pod and are supposed to. Tags:kube_daemon_set kube_namespace (env service version from standard labels). |
kubernetes_state.daemonset.updated (gauge) | The total number of nodes that are running updated daemon pod. Tags:kube_daemon_set kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.condition (gauge) | The current status conditions of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.count (gauge) | Number of deployments. Tags:kube_namespace . |
kubernetes_state.deployment.paused (gauge) | Whether the deployment is paused and will not be processed by the deployment controller. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.replicas (gauge) | The number of replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.replicas_available (gauge) | The number of available replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.replicas_desired (gauge) | Number of desired pods for a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.replicas_ready (gauge) | The number of ready replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.replicas_unavailable (gauge) | The number of unavailable replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.replicas_updated (gauge) | The number of updated replicas per deployment. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.rollingupdate.max_surge (gauge) | Maximum number of replicas that can be scheduled above the desired number of replicas during a rolling update of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.deployment.rollingupdate.max_unavailable (gauge) | Maximum number of unavailable replicas during a rolling update of a deployment. Tags:kube_deployment kube_namespace (env service version from standard labels). |
kubernetes_state.endpoint.address_available (gauge) | Number of addresses available in endpoint. Tags:endpoint kube_namespace . |
kubernetes_state.endpoint.address_not_ready (gauge) | Number of addresses not ready in endpoint. Tags:endpoint kube_namespace . |
kubernetes_state.endpoint.count (gauge) | Number of endpoints. Tags:kube_namespace . |
kubernetes_state.hpa.condition (gauge) | The condition of this autoscaler. Tags:kube_namespace horizontalpodautoscaler condition status . |
kubernetes_state.hpa.count (gauge) | Number of horizontal pod autoscaler. Tags: kube_namespace . |
kubernetes_state.hpa.current_replicas (gauge) | Current number of replicas of pods managed by this autoscaler. Tags:kube_namespace horizontalpodautoscaler . |
kubernetes_state.hpa.desired_replicas (gauge) | Desired number of replicas of pods managed by this autoscaler. Tags:kube_namespace horizontalpodautoscaler . |
kubernetes_state.hpa.max_replicas (gauge) | Upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas. Tags:kube_namespace horizontalpodautoscaler . |
kubernetes_state.hpa.min_replicas (gauge) | Lower limit for the number of pods that can be set by the autoscaler default 1. Tags:kube_namespace horizontalpodautoscaler . |
kubernetes_state.hpa.spec_target_metric (gauge) | The metric specifications used by this autoscaler when calculating the desired replica count. Tags:kube_namespace horizontalpodautoscaler metric_name metric_target_type . |
kubernetes_state.hpa.status_target_metric (gauge) | The current metric status used by this autoscaler when calculating the desired replica count. Tags:kube_namespace horizontalpodautoscaler metric_name metric_target_type . |
kubernetes_state.ingress.count (gauge) | Number of ingresses. Tags:kube_namespace . |
kubernetes_state.ingress.path (gauge) | Information about the ingress path. Tags:kube_namespace kube_ingress_path kube_ingress kube_service kube_service_port kube_ingress_host . |
kubernetes_state.initcontainer.restarts (gauge) | Describes whether the number of restarts for the init container. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels). |
kubernetes_state.initcontainer.waiting (gauge) | Describes whether the init container is currently in waiting state. Tags:kube_namespace pod_name kube_container_name (env service version from standard labels). |
kubernetes_state.job.completion.failed (gauge) | The job has failed its execution. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels). |
kubernetes_state.job.completion.succeeded (gauge) | The job has completed its execution. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels). |
kubernetes_state.job.count (gauge) | Number of jobs. Tags:kube_namespace kube_cronjob . |
kubernetes_state.job.duration (gauge) | Time elapsed between the start and completion time of the job or the current time if the job is still running. Tags:kube_job kube_namespace (env service version from standard labels). |
kubernetes_state.job.failed (gauge) | The number of pods which reached Phase Failed. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels). |
kubernetes_state.job.succeeded (gauge) | The number of pods which reached Phase Succeeded. Tags:kube_job or kube_cronjob kube_namespace (env service version from standard labels). |
kubernetes_state.limitrange.cpu.default (gauge) | Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as cpu |
kubernetes_state.limitrange.cpu.default_request (gauge) | Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as cpu |
kubernetes_state.limitrange.cpu.max (gauge) | Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as cpu |
kubernetes_state.limitrange.cpu.max_limit_request_ratio (gauge) | Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as cpu |
kubernetes_state.limitrange.cpu.min (gauge) | Information about CPU limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as cpu |
kubernetes_state.limitrange.memory.default (gauge) | Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as byte |
kubernetes_state.limitrange.memory.default_request (gauge) | Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as byte |
kubernetes_state.limitrange.memory.max (gauge) | Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as byte |
kubernetes_state.limitrange.memory.max_limit_request_ratio (gauge) | Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as byte |
kubernetes_state.limitrange.memory.min (gauge) | Information about memory limit range usage by constraint. Tags:kube_namespace limitrange type .Shown as byte |
kubernetes_state.namespace.count (gauge) | Number of namespaces. Tags:phase . |
kubernetes_state.node.age (gauge) | The time in seconds since the creation of the node. Tags:node .Shown as second |
kubernetes_state.node.by_condition (gauge) | The condition of a cluster node. Tags:condition node status . |
kubernetes_state.node.count (gauge) | Number of nodes. Tags:kernel_version os_image container_runtime_version kubelet_version . |
kubernetes_state.node.cpu_allocatable (gauge) | The allocatable CPU of a node that is available for scheduling. Tags:node resource unit .Shown as cpu |
kubernetes_state.node.cpu_allocatable.total (gauge) | The total allocatable CPU of all nodes in the cluster that is available for scheduling. Shown as cpu |
kubernetes_state.node.cpu_capacity (gauge) | The CPU capacity of a node. Tags:node resource unit .Shown as cpu |
kubernetes_state.node.cpu_capacity.total (gauge) | The total CPU capacity of all nodes in the cluster. Shown as cpu |
kubernetes_state.node.ephemeral_storage_allocatable (gauge) | The allocatable ephemeral-storage of a node that is available for scheduling. Tags:node resource unit . |
kubernetes_state.node.ephemeral_storage_capacity (gauge) | The ephemeral-storage capacity of a node. Tags:node resource unit . |
kubernetes_state.node.gpu_allocatable (gauge) | The allocatable GPU of a node that is available for scheduling. Tags:node resource mig_profile unit . |
kubernetes_state.node.gpu_allocatable.total (gauge) | The total allocatable GPU of all nodes in the cluster that is available for scheduling. |
kubernetes_state.node.gpu_capacity (gauge) | The GPU capacity of a node. Tags:node resource mig_profile unit . |
kubernetes_state.node.gpu_capacity.total (gauge) | The total GPU capacity of all nodes in the cluster. |
kubernetes_state.node.memory_allocatable (gauge) | The allocatable memory of a node that is available for scheduling. Tags:node resource unit .Shown as byte |
kubernetes_state.node.memory_allocatable.total (gauge) | The total allocatable memory of all nodes in the cluster that is available for scheduling. Shown as byte |
kubernetes_state.node.memory_capacity (gauge) | The memory capacity of a node. Tags:node resource unit .Shown as byte |
kubernetes_state.node.memory_capacity.total (gauge) | The total memory capacity of all nodes in the cluster. Shown as byte |
kubernetes_state.node.network_bandwidth_allocatable (gauge) | The allocatable network bandwidth of a node that is available for scheduling. Tags:node resource unit . |
kubernetes_state.node.network_bandwidth_capacity (gauge) | The network bandwidth capacity of a node. Tags:node resource unit . |
kubernetes_state.node.pods_allocatable (gauge) | The allocatable memory of a node that is available for scheduling. Tags:node resource unit . |
kubernetes_state.node.pods_capacity (gauge) | The pods capacity of a node. Tags:node resource unit . |
kubernetes_state.node.status (gauge) | Whether the node can schedule new pods. Tags:node status . |
kubernetes_state.pdb.disruptions_allowed (gauge) | Number of pod disruptions that are currently allowed. Tags:kube_namespace poddisruptionbudget . |
kubernetes_state.pdb.pods_desired (gauge) | Minimum desired number of healthy pods. Tags:kube_namespace poddisruptionbudget . |
kubernetes_state.pdb.pods_healthy (gauge) | Current number of healthy pods. Tags:kube_namespace poddisruptionbudget . |
kubernetes_state.pdb.pods_total (gauge) | Total number of pods counted by this disruption budget. Tags:kube_namespace poddisruptionbudget . |
kubernetes_state.persistentvolume.by_phase (gauge) | The phase indicates if a volume is available bound to a claim or released by a claim. Tags:persistentvolume storageclass phase . |
kubernetes_state.persistentvolume.capacity (gauge) | Persistentvolume capacity in bytes. Tags:persistentvolume storageclass . |
kubernetes_state.persistentvolumeclaim.access_mode (gauge) | The access mode(s) specified by the persistent volume claim. Tags:kube_namespace persistentvolumeclaim access_mode storageclass . |
kubernetes_state.persistentvolumeclaim.request_storage (gauge) | The capacity of storage requested by the persistent volume claim. Tags:kube_namespace persistentvolumeclaim storageclass . |
kubernetes_state.persistentvolumeclaim.status (gauge) | The phase the persistent volume claim is currently in. Tags:kube_namespace persistentvolumeclaim phase storageclass . |
kubernetes_state.pod.age (gauge) | The time in seconds since the creation of the pod. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels).Shown as second |
kubernetes_state.pod.count (gauge) | Number of Pods. Tags:node kube_namespace kube_<owner kind> . |
kubernetes_state.pod.ready (gauge) | Describes whether the pod is ready to serve requests. Tags:node kube_namespace pod_name condition (env service version from standard labels). |
kubernetes_state.pod.scheduled (gauge) | Describes the status of the scheduling process for the pod. Tags:node kube_namespace pod_name condition (env service version from standard labels). |
kubernetes_state.pod.status_phase (gauge) | The pods current phase. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels). |
kubernetes_state.pod.tolerations (gauge) | Information about the pod tolerations |
kubernetes_state.pod.unschedulable (gauge) | Describes the unschedulable status for the pod. Tags:kube_namespace pod_name (env service version from standard labels). |
kubernetes_state.pod.uptime (gauge) | The time in seconds since the pod has been scheduled and acknowledged by the Kubelet. Tags:node kube_namespace pod_name pod_phase (env service version from standard labels). |
kubernetes_state.pod.volumes.persistentvolumeclaims_readonly (gauge) | Describes whether a persistentvolumeclaim is mounted read only. Tags:node kube_namespace pod_name volume persistentvolumeclaim (env service version from standard labels). |
kubernetes_state.replicaset.count (gauge) | Number of ReplicaSets Tags:kube_namespace kube_deployment . |
kubernetes_state.replicaset.fully_labeled_replicas (gauge) | The number of fully labeled replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels). |
kubernetes_state.replicaset.replicas (gauge) | The number of replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels). |
kubernetes_state.replicaset.replicas_desired (gauge) | Number of desired pods for a ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels). |
kubernetes_state.replicaset.replicas_ready (gauge) | The number of ready replicas per ReplicaSet. Tags:kube_namespace kube_replica_set (env service version from standard labels). |
kubernetes_state.replicationcontroller.fully_labeled_replicas (gauge) | The number of fully labeled replicas per ReplicationController. Tags:kube_namespace kube_replication_controller . |
kubernetes_state.replicationcontroller.replicas (gauge) | The number of replicas per ReplicationController. Tags:kube_namespace kube_replication_controller . |
kubernetes_state.replicationcontroller.replicas_available (gauge) | The number of available replicas per ReplicationController. Tags:kube_namespace kube_replication_controller . |
kubernetes_state.replicationcontroller.replicas_desired (gauge) | Number of desired pods for a ReplicationController. Tags:kube_namespace kube_replication_controller . |
kubernetes_state.replicationcontroller.replicas_ready (gauge) | The number of ready replicas per ReplicationController. Tags:kube_namespace kube_replication_controller . |
kubernetes_state.resourcequota.count_configmaps.limit (gauge) | Information about resource quota limits by resource. Tags:kube_namespace resourcequota . |
kubernetes_state.resourcequota.count_configmaps.used (gauge) | Information about resource quota usage by resource. Tags:kube_namespace resourcequota . |
kubernetes_state.resourcequota.count_secrets.limit (gauge) | Information about resource quota limits by resource. Tags:kube_namespace resourcequota . |
kubernetes_state.resourcequota.count_secrets.used (gauge) | Information about resource quota usage by resource. Tags:kube_namespace resourcequota . |
kubernetes_state.resourcequota.pods.limit (gauge) | Information about resource quota limits by resource. Tags:kube_namespace resourcequota . |
kubernetes_state.resourcequota.pods.used (gauge) | Information about resource quota usage by resource. Tags:kube_namespace resourcequota . |
kubernetes_state.resourcequota.requests.cpu.limit (gauge) | Information about resource quota limits by resource. Tags:kube_namespace resourcequota . |
kubernetes_state.resourcequota.requests.cpu.used (gauge) | Information about resource quota usage by resource. Tags:kube_namespace resourcequota . |
kubernetes_state.secret.count (gauge) | Number of Secrets. Requires Secrets to be added to Cluster Agent collector. Tags: kube_namespace . |
kubernetes_state.secret.type (gauge) | Type about secret. Tags:kube_namespace secret type . |
kubernetes_state.service.count (gauge) | Number of services. Tags:kube_namespace type . |
kubernetes_state.service.type (gauge) | Service types. Tags:kube_namespace kube_service type . |
kubernetes_state.statefulset.count (gauge) | Number of StatefulSets Tags:kube_namespace . |
kubernetes_state.statefulset.replicas (gauge) | The number of replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels). |
kubernetes_state.statefulset.replicas_current (gauge) | The number of current replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels). |
kubernetes_state.statefulset.replicas_desired (gauge) | Number of desired pods for a StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels). |
kubernetes_state.statefulset.replicas_ready (gauge) | The number of ready replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels). |
kubernetes_state.statefulset.replicas_updated (gauge) | The number of updated replicas per StatefulSet. Tags:kube_namespace kube_stateful_set (env service version from standard labels). |
kubernetes_state.vpa.count (gauge) | Number of vertical pod autoscaler. Tags: kube_namespace . |
kubernetes_state.vpa.lower_bound (gauge) | Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit . |
kubernetes_state.vpa.spec_container_maxallowed (gauge) | Maximum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit . |
kubernetes_state.vpa.spec_container_minallowed (gauge) | Minimum resources the VerticalPodAutoscaler can set for containers matching the name. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit . |
kubernetes_state.vpa.target (gauge) | Target resources the VerticalPodAutoscaler recommends for the container. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit . |
kubernetes_state.vpa.uncapped_target (gauge) | Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit . |
kubernetes_state.vpa.update_mode (gauge) | Update mode of the VerticalPodAutoscaler. Tags:kube_namespace verticalpodautoscaler target_api_version target_kind target_name update_mode . |
kubernetes_state.vpa.upperbound (gauge) | Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it. Tags:kube_namespace verticalpodautoscaler kube_container_name resource target_api_version target_kind target_name unit . |
Note: You can configure Datadog Standard labels on your Kubernetes objects to get the env
service
version
tags.
The Kubernetes State Metrics Core check does not include any events.
Recommended Label | Tag |
---|---|
app.kubernetes.io/name | kube_app_name |
app.kubernetes.io/instance | kube_app_instance |
app.kubernetes.io/version | kube_app_version |
app.kubernetes.io/component | kube_app_component |
app.kubernetes.io/part-of | kube_app_part_of |
app.kubernetes.io/managed-by | kube_app_managed_by |
helm.sh/chart | helm_chart |
Recommended Label | Tag |
---|---|
topology.kubernetes.io/region | kube_region |
topology.kubernetes.io/zone | kube_zone |
failure-domain.beta.kubernetes.io/region | kube_region |
failure-domain.beta.kubernetes.io/zone | kube_zone |
Datadog Label | Tag |
---|---|
tags.datadoghq.com/env | env |
tags.datadoghq.com/service | service |
tags.datadoghq.com/version | version |
kubernetes_state.cronjob.complete
kube_cronjob
kube_namespace
(env
service
version
from standard labels).kubernetes_state.cronjob.on_schedule_check
kube_cronjob
kube_namespace
(env
service
version
from standard labels).kubernetes_state.job.complete
kube_job
or kube_cronjob
kube_namespace
(env
service
version
from standard labels).kubernetes_state.node.ready
node
condition
status
.kubernetes_state.node.out_of_disk
node
condition
status
.kubernetes_state.node.disk_pressure
node
condition
status
.kubernetes_state.node.network_unavailable
node
condition
status
.kubernetes_state.node.memory_pressure
node
condition
status
.Run the Cluster Agent’s status
subcommand inside your Cluster Agent container and look for kubernetes_state_core
under the Checks section.
By default, the Kubernetes State Metrics Core check waits 10 seconds for a response from the Kubernetes API server. For large clusters, the request may time out, resulting in missing metrics.
You can avoid this by setting the environment variable DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT
to a higher value than the default 10 seconds.
Update your datadog-agent.yaml
with the following configuration:
apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
name: datadog
spec:
override:
clusterAgent:
env:
- name: DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT
value: <value_greater_than_10>
Then apply the new configuration:
kubectl apply -n $DD_NAMESPACE -f datadog-agent.yaml
Update your datadog-values.yaml
with the following configuration:
clusterAgent:
env:
- name: DD_KUBERNETES_APISERVER_CLIENT_TIMEOUT
value: <value_greater_than_10>
Then upgrade your Helm chart:
helm upgrade -f datadog-values.yaml <RELEASE_NAME> datadog/datadog
Need help? Contact Datadog support.