- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
This check monitors Kubernetes Cluster Autoscaler through the Datadog Agent.
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.
The Kubernetes Cluster Autoscaler check is included in the Datadog Agent package. No additional installation is needed on your server.
Edit the kubernetes_cluster_autoscaler.d/conf.yaml
file, in the conf.d/
folder at the root of your Agent’s configuration directory to start collecting your kubernetes_cluster_autoscaler performance data. See the sample kubernetes_cluster_autoscaler.d/conf.yaml for all available configuration options.
Make sure that the Prometheus-formatted metrics are exposed in your kubernetes_cluster_autoscaler
cluster.
For the Agent to start collecting metrics, the kubernetes_cluster_autoscaler
pods need to be annotated.
Kubernetes Cluster Autoscaler has metrics and livenessProbe endpoints that can be accessed on port 8085
. These endpoints are located under /metrics
and /health-check
and provide valuable information about the state of your cluster during scaling operations.
Note: To change the default port, use the --address
flag.
To configure the Cluster Autoscaler to expose metrics, do the following:
/metrics
route and expose port 8085
for your Cluster Autoscaler deployment:ports:
--name: app
containerPort: 8085
b) instruct your Prometheus to scrape it, by adding the following annotation to your Cluster Autoscaler service:
prometheus.io/scrape: true
Note: The listed metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed.
The only parameter required for configuring the kubernetes_cluster_autoscaler
check is openmetrics_endpoint
. This parameter should be set to the location where the Prometheus-formatted metrics are exposed. The default port is 8085
. To configure a different port, use the METRICS_PORT
environment variable. In containerized environments, %%host%%
should be used for host autodetection.
apiVersion: v1
kind: Pod
# (...)
metadata:
name: '<POD_NAME>'
annotations:
ad.datadoghq.com/controller.checks: |
{
"kubernetes_cluster_autoscaler": {
"init_config": {},
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8085/metrics"
}
]
}
}
# (...)
spec:
containers:
- name: 'controller'
# (...)
Run the Agent’s status subcommand and look for kubernetes_cluster_autoscaler
under the Checks section.
kubernetes_cluster_autoscaler.cluster.cpu.current.cores (gauge) | Current CPU cores usage in the cluster |
kubernetes_cluster_autoscaler.cluster.memory.current.bytes (gauge) | Current memory usage in bytes in the cluster |
kubernetes_cluster_autoscaler.cluster.safe.to.autoscale (gauge) | Indicates whether the cluster is safe to autoscale |
kubernetes_cluster_autoscaler.cpu.limits.cores (gauge) | Total CPU cores limits set for pods in the cluster |
kubernetes_cluster_autoscaler.created.node.groups.count (count) | Total count of node groups created in the cluster |
kubernetes_cluster_autoscaler.deleted.node.groups.count (count) | Total count of node groups deleted in the cluster |
kubernetes_cluster_autoscaler.errors.count (count) | Total count of errors occurred in the cluster |
kubernetes_cluster_autoscaler.evicted.pods.count (count) | Total count of evicted pods in the cluster |
kubernetes_cluster_autoscaler.failed.scale.ups.count (count) | Total count of failed scale-up operations in the cluster |
kubernetes_cluster_autoscaler.function.duration.seconds.bucket (count) | Duration of a specific function in the cluster (bucket) |
kubernetes_cluster_autoscaler.function.duration.seconds.count (count) | Duration of a specific function in the cluster (count) |
kubernetes_cluster_autoscaler.function.duration.seconds.sum (count) | Duration of a specific function in the cluster (sum) |
kubernetes_cluster_autoscaler.go.gc.duration.seconds.count (count) | A summary of the pause duration of garbage collection cycles. Shown as second |
kubernetes_cluster_autoscaler.go.gc.duration.seconds.quantile (gauge) | A summary of the pause duration of garbage collection cycles Shown as second |
kubernetes_cluster_autoscaler.go.gc.duration.seconds.sum (count) | A summary of the pause duration of garbage collection cycles Shown as second |
kubernetes_cluster_autoscaler.go.goroutines (gauge) | Number of goroutines that currently exist |
kubernetes_cluster_autoscaler.go.info (gauge) | Information about the Go environment |
kubernetes_cluster_autoscaler.go.memstats.alloc_bytes (gauge) | Number of bytes allocated and still in use Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.alloc_bytes.count (count) | Total number of bytes allocated even if freed Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.buck_hash.sys_bytes (gauge) | Number of bytes used by the profiling bucket hash table Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.frees.count (count) | Total number of frees |
kubernetes_cluster_autoscaler.go.memstats.gc.sys_bytes (gauge) | Number of bytes used for garbage collection system metadata Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.alloc_bytes (gauge) | Number of heap bytes allocated and still in use Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.idle_bytes (gauge) | Number of heap bytes waiting to be used Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.inuse_bytes (gauge) | Number of heap bytes that are in use Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.objects (gauge) | Number of allocated objects Shown as object |
kubernetes_cluster_autoscaler.go.memstats.heap.released_bytes (gauge) | Number of heap bytes released to OS Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.sys_bytes (gauge) | Number of heap bytes obtained from system Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.lookups.count (count) | Total number of pointer lookups |
kubernetes_cluster_autoscaler.go.memstats.mallocs.count (count) | Total number of mallocs |
kubernetes_cluster_autoscaler.go.memstats.mcache.inuse_bytes (gauge) | Number of bytes in use by mcache structures Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.mcache.sys_bytes (gauge) | Number of bytes used for mcache structures obtained from system Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.mspan.inuse_bytes (gauge) | Number of bytes in use by mspan structures Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.mspan.sys_bytes (gauge) | Number of bytes used for mspan structures obtained from system Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.next.gc_bytes (gauge) | Number of heap bytes when next garbage collection will take place Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.other.sys_bytes (gauge) | Number of bytes used for other system allocations Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.stack.inuse_bytes (gauge) | Number of bytes in use by the stack allocator Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.stack.sys_bytes (gauge) | Number of bytes obtained from system for stack allocator Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.sys_bytes (gauge) | Number of bytes obtained from system Shown as byte |
kubernetes_cluster_autoscaler.go.threads (gauge) | Number of OS threads created Shown as thread |
kubernetes_cluster_autoscaler.last.activity (gauge) | Timestamp of the last activity in the cluster |
kubernetes_cluster_autoscaler.max.nodes.count (gauge) | Maximum number of nodes allowed in the cluster |
kubernetes_cluster_autoscaler.memory.limits.bytes (gauge) | Total memory limits set for pods in the cluster |
kubernetes_cluster_autoscaler.nap.enabled (gauge) | Indicates whether Node Auto-Provisioning (NAP) is enabled in the cluster |
kubernetes_cluster_autoscaler.node.groups.count (gauge) | Number of node groups in the cluster |
kubernetes_cluster_autoscaler.nodes.count (gauge) | Number of nodes in cluster |
kubernetes_cluster_autoscaler.old.unregistered.nodes.removed.count (count) | Total count of old unregistered nodes removed from the cluster |
kubernetes_cluster_autoscaler.scaled.down.gpu.nodes.count (count) | Total count of GPU nodes scaled down in the cluster |
kubernetes_cluster_autoscaler.scaled.down.nodes.count (count) | Total count of nodes scaled down in the cluster |
kubernetes_cluster_autoscaler.scaled.up.gpu.nodes.count (count) | Total count of GPU nodes scaled up in the cluster |
kubernetes_cluster_autoscaler.scaled.up.nodes.count (count) | Total count of nodes scaled up in the cluster |
kubernetes_cluster_autoscaler.skipped.scale.events.count (count) | Total count of skipped scale events in the cluster |
kubernetes_cluster_autoscaler.unneeded.nodes.count (gauge) | Total count of unneeded nodes in the cluster |
kubernetes_cluster_autoscaler.unschedulable.pods.count (gauge) | Number of unschedulable pods in the cluster |
The Kubernetes Cluster Autoscaler integration does not include any events.
kubernetes_cluster_autoscaler.openmetrics.health
Returns CRITICAL
if the Agent is unable to connect to the Kubernetes Cluster Autoscaler OpenMetrics endpoint, otherwise returns OK
.
Statuses: ok, critical
Need help? Contact Datadog support.