- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
",t};e.buildCustomizationMenuUi=t;function n(e){let t='
",t}function s(e){let n=e.filter.currentValue||e.filter.defaultValue,t='${e.filter.label}
`,e.filter.options.forEach(s=>{let o=s.id===n;t+=``}),t+="${e.filter.label}
`,t+=`Supported OS
This check monitors Kubernetes Cluster Autoscaler through the Datadog Agent.
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.
The Kubernetes Cluster Autoscaler check is included in the Datadog Agent package. (Agent >= 7.55.x) No additional installation is needed on your server.
Edit the kubernetes_cluster_autoscaler.d/conf.yaml
file, in the conf.d/
folder at the root of your Agent’s configuration directory to start collecting your kubernetes_cluster_autoscaler performance data. See the sample kubernetes_cluster_autoscaler.d/conf.yaml for all available configuration options.
Make sure that the Prometheus-formatted metrics are exposed in your kubernetes_cluster_autoscaler
cluster.
For the Agent to start collecting metrics, the kubernetes_cluster_autoscaler
pods need to be annotated.
Kubernetes Cluster Autoscaler has metrics and livenessProbe endpoints that can be accessed on port 8085
. These endpoints are located under /metrics
and /health-check
and provide valuable information about the state of your cluster during scaling operations.
Note: To change the default port, use the --address
flag.
To configure the Cluster Autoscaler to expose metrics, do the following:
/metrics
route and expose port 8085
for your Cluster Autoscaler deployment:ports:
--name: app
containerPort: 8085
b) instruct your Prometheus to scrape it, by adding the following annotation to your Cluster Autoscaler service:
prometheus.io/scrape: true
Note: The listed metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed.
The only parameter required for configuring the kubernetes_cluster_autoscaler
check is openmetrics_endpoint
. This parameter should be set to the location where the Prometheus-formatted metrics are exposed. The default port is 8085
. To configure a different port, use the METRICS_PORT
environment variable. In containerized environments, %%host%%
should be used for host autodetection.
apiVersion: v1
kind: Pod
# (...)
metadata:
name: '<POD_NAME>'
annotations:
ad.datadoghq.com/controller.checks: |
{
"kubernetes_cluster_autoscaler": {
"init_config": {},
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8085/metrics"
}
]
}
}
# (...)
spec:
containers:
- name: 'controller'
# (...)
Run the Agent’s status subcommand and look for kubernetes_cluster_autoscaler
under the Checks section.
kubernetes_cluster_autoscaler.cluster.cpu.current.cores (gauge) | Current CPU cores usage in the cluster |
kubernetes_cluster_autoscaler.cluster.memory.current.bytes (gauge) | Current memory usage in bytes in the cluster |
kubernetes_cluster_autoscaler.cluster.safe.to.autoscale (gauge) | Indicates whether the cluster is safe to autoscale |
kubernetes_cluster_autoscaler.cpu.limits.cores (gauge) | Total CPU cores limits set for pods in the cluster |
kubernetes_cluster_autoscaler.created.node.groups.count (count) | Total count of node groups created in the cluster |
kubernetes_cluster_autoscaler.deleted.node.groups.count (count) | Total count of node groups deleted in the cluster |
kubernetes_cluster_autoscaler.errors.count (count) | Total count of errors occurred in the cluster |
kubernetes_cluster_autoscaler.evicted.pods.count (count) | Total count of evicted pods in the cluster |
kubernetes_cluster_autoscaler.failed.scale.ups.count (count) | Total count of failed scale-up operations in the cluster |
kubernetes_cluster_autoscaler.function.duration.seconds.bucket (count) | Duration of a specific function in the cluster (bucket) |
kubernetes_cluster_autoscaler.function.duration.seconds.count (count) | Duration of a specific function in the cluster (count) |
kubernetes_cluster_autoscaler.function.duration.seconds.sum (count) | Duration of a specific function in the cluster (sum) |
kubernetes_cluster_autoscaler.go.gc.duration.seconds.count (count) | A summary of the pause duration of garbage collection cycles. Shown as second |
kubernetes_cluster_autoscaler.go.gc.duration.seconds.quantile (gauge) | A summary of the pause duration of garbage collection cycles Shown as second |
kubernetes_cluster_autoscaler.go.gc.duration.seconds.sum (count) | A summary of the pause duration of garbage collection cycles Shown as second |
kubernetes_cluster_autoscaler.go.goroutines (gauge) | Number of goroutines that currently exist |
kubernetes_cluster_autoscaler.go.info (gauge) | Information about the Go environment |
kubernetes_cluster_autoscaler.go.memstats.alloc_bytes (gauge) | Number of bytes allocated and still in use Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.alloc_bytes.count (count) | Total number of bytes allocated even if freed Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.buck_hash.sys_bytes (gauge) | Number of bytes used by the profiling bucket hash table Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.frees.count (count) | Total number of frees |
kubernetes_cluster_autoscaler.go.memstats.gc.sys_bytes (gauge) | Number of bytes used for garbage collection system metadata Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.alloc_bytes (gauge) | Number of heap bytes allocated and still in use Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.idle_bytes (gauge) | Number of heap bytes waiting to be used Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.inuse_bytes (gauge) | Number of heap bytes that are in use Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.objects (gauge) | Number of allocated objects Shown as object |
kubernetes_cluster_autoscaler.go.memstats.heap.released_bytes (gauge) | Number of heap bytes released to OS Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.heap.sys_bytes (gauge) | Number of heap bytes obtained from system Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.lookups.count (count) | Total number of pointer lookups |
kubernetes_cluster_autoscaler.go.memstats.mallocs.count (count) | Total number of mallocs |
kubernetes_cluster_autoscaler.go.memstats.mcache.inuse_bytes (gauge) | Number of bytes in use by mcache structures Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.mcache.sys_bytes (gauge) | Number of bytes used for mcache structures obtained from system Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.mspan.inuse_bytes (gauge) | Number of bytes in use by mspan structures Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.mspan.sys_bytes (gauge) | Number of bytes used for mspan structures obtained from system Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.next.gc_bytes (gauge) | Number of heap bytes when next garbage collection will take place Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.other.sys_bytes (gauge) | Number of bytes used for other system allocations Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.stack.inuse_bytes (gauge) | Number of bytes in use by the stack allocator Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.stack.sys_bytes (gauge) | Number of bytes obtained from system for stack allocator Shown as byte |
kubernetes_cluster_autoscaler.go.memstats.sys_bytes (gauge) | Number of bytes obtained from system Shown as byte |
kubernetes_cluster_autoscaler.go.threads (gauge) | Number of OS threads created Shown as thread |
kubernetes_cluster_autoscaler.last.activity (gauge) | Timestamp of the last activity in the cluster |
kubernetes_cluster_autoscaler.max.nodes.count (gauge) | Maximum number of nodes allowed in the cluster |
kubernetes_cluster_autoscaler.memory.limits.bytes (gauge) | Total memory limits set for pods in the cluster |
kubernetes_cluster_autoscaler.nap.enabled (gauge) | Indicates whether Node Auto-Provisioning (NAP) is enabled in the cluster |
kubernetes_cluster_autoscaler.node.groups.count (gauge) | Number of node groups in the cluster |
kubernetes_cluster_autoscaler.nodes.count (gauge) | Number of nodes in cluster |
kubernetes_cluster_autoscaler.old.unregistered.nodes.removed.count (count) | Total count of old unregistered nodes removed from the cluster |
kubernetes_cluster_autoscaler.scaled.down.gpu.nodes.count (count) | Total count of GPU nodes scaled down in the cluster |
kubernetes_cluster_autoscaler.scaled.down.nodes.count (count) | Total count of nodes scaled down in the cluster |
kubernetes_cluster_autoscaler.scaled.up.gpu.nodes.count (count) | Total count of GPU nodes scaled up in the cluster |
kubernetes_cluster_autoscaler.scaled.up.nodes.count (count) | Total count of nodes scaled up in the cluster |
kubernetes_cluster_autoscaler.skipped.scale.events.count (count) | Total count of skipped scale events in the cluster |
kubernetes_cluster_autoscaler.unneeded.nodes.count (gauge) | Total count of unneeded nodes in the cluster |
kubernetes_cluster_autoscaler.unschedulable.pods.count (gauge) | Number of unschedulable pods in the cluster |
The Kubernetes Cluster Autoscaler integration does not include any events.
kubernetes_cluster_autoscaler.openmetrics.health
Returns CRITICAL
if the Agent is unable to connect to the Kubernetes Cluster Autoscaler OpenMetrics endpoint, otherwise returns OK
.
Statuses: ok, critical
Need help? Contact Datadog support.