Datadog Cluster Agent

Supported OS

Versión de la integración5.1.0

Información general

Este check monitoriza el Datadog Cluster Agent a través del Datadog Agent.

Configuración

Sigue las instrucciones a continuación para instalar y configurar este check para un Agent que se ejecuta en un host. Para entornos en contenedores, consulta las plantillas de integración de Autodiscovery para obtener orientación sobre la aplicación de estas instrucciones.

Instalación

El check del Datadog Cluster Agent está incluido en el paquete del Datadog Agent. No es necesaria ninguna instalación adicional en tu servidor.

Configuración

El check del Datadog Cluster Agent utiliza Autodiscovery para configurarse automáticamente en la mayoría de los casos. El check se ejecuta en el pod del Datadog Agent, en el mismo nodo que el pod del Cluster Agent, pero no se ejecutará en el propio Cluster Agent.

Si necesitas configurar el check:

  1. Edita el archivo datadog_cluster_agent.d/conf.yaml, en la carpeta conf.d/ en la raíz del directorio de configuración de tu Agent, para comenzar a recopilar los datos de rendimiento del Datadog Cluster Agent. Consulta el datadog_cluster_agent.d/conf.yaml de ejemplo para conocer todas las opciones de configuración disponibles.

  2. Reinicia el Agent.

Validación

Ejecuta el subcomando de estado del Agent y busca datadog_cluster_agent en la sección Checks.

Datos recopilados

Métricas

datadog.cluster_agent.admission_webhooks.certificate_expiry
(gauge)
Time left before the certificate expires
Shown as hour
datadog.cluster_agent.admission_webhooks.cws_exec_instrumentation_attempts.count
(count)
CWS exec Instrumentation attempts count
datadog.cluster_agent.admission_webhooks.cws_exec_instrumentation_attempts.sum
(count)
CWS exec Instrumentation attempts sum
datadog.cluster_agent.admission_webhooks.cws_pod_instrumentation_attempts.count
(count)
CWS pod Instrumentation attempts count
datadog.cluster_agent.admission_webhooks.cws_pod_instrumentation_attempts.sum
(count)
CWS pod Instrumentation attempts sum
datadog.cluster_agent.admission_webhooks.library_injection_attempts
(count)
Number of library injection attempts by language
datadog.cluster_agent.admission_webhooks.library_injection_errors
(count)
Number of library injection failures by language
datadog.cluster_agent.admission_webhooks.mutation_attempts
(gauge)
Number of pod mutation attempts by mutation type
datadog.cluster_agent.admission_webhooks.mutation_errors
(gauge)
Number of mutation failures by mutation type
datadog.cluster_agent.admission_webhooks.patcher.attempts
(count)
Number of patch attempts
datadog.cluster_agent.admission_webhooks.patcher.completed
(count)
Number of completed patch attempts
datadog.cluster_agent.admission_webhooks.patcher.errors
(count)
Number of patch errors
datadog.cluster_agent.admission_webhooks.rc_provider.configs
(gauge)
Number of valid remote configuration
datadog.cluster_agent.admission_webhooks.rc_provider.invalid_configs
(gauge)
Number of invalid remote configurations
datadog.cluster_agent.admission_webhooks.reconcile_errors
(gauge)
Number of reconcile errors per controller
datadog.cluster_agent.admission_webhooks.reconcile_success
(gauge)
Number of reconcile successes per controller
Shown as success
datadog.cluster_agent.admission_webhooks.response_duration.count
(count)
Webhook response duration count
datadog.cluster_agent.admission_webhooks.response_duration.sum
(count)
Webhook response duration sum
Shown as second
datadog.cluster_agent.admission_webhooks.validation_attempts
(gauge)
Number of pod validation attempts by validation type
datadog.cluster_agent.admission_webhooks.webhooks_received
(gauge)
Number of webhook requests received
datadog.cluster_agent.aggregator.flush
(count)
Number of metrics/service checks/events flushed by (data_type, state)
datadog.cluster_agent.aggregator.processed
(count)
Amount of metrics/serviceschecks/events processed by the aggregator by datatype
datadog.cluster_agent.api_requests
(count)
Requests made to the cluster agent API by (handler, status)
Shown as request
datadog.cluster_agent.autodiscovery.errors
(gauge)
Number of Autodiscovery errors
datadog.cluster_agent.autodiscovery.poll_duration.count
(count)
Autodiscovery poll duration count
datadog.cluster_agent.autodiscovery.poll_duration.sum
(count)
Autodiscovery poll duration sum
Shown as second
datadog.cluster_agent.autodiscovery.watched_resources
(gauge)
Number of watched resources (Services and Endpoints)
datadog.cluster_agent.cluster_checks.busyness
(gauge)
Busyness of a node per the number of metrics submitted and average duration of all checks run
datadog.cluster_agent.cluster_checks.configs_dangling
(gauge)
Number of check configurations not dispatched
datadog.cluster_agent.cluster_checks.configs_dispatched
(gauge)
Number of check configurations dispatched by node
datadog.cluster_agent.cluster_checks.configs_info
(gauge)
Information about check configurations dispatched (node and check ID)
datadog.cluster_agent.cluster_checks.failed_stats_collection
(count)
Total number of unsuccessful stats collection attempts
datadog.cluster_agent.cluster_checks.nodes_reporting
(gauge)
Number of node agents reporting
datadog.cluster_agent.cluster_checks.rebalancing_decisions
(count)
Total number of check rebalancing decisions
datadog.cluster_agent.cluster_checks.rebalancing_duration_seconds
(gauge)
Duration of the check rebalancing algorithm last execution
Shown as second
datadog.cluster_agent.cluster_checks.successful_rebalancing_moves
(count)
Total number of successful check rebalancing decisions
Shown as check
datadog.cluster_agent.cluster_checks.updating_stats_duration_seconds
(gauge)
Duration of collecting stats from check runners and updating cache
Shown as second
datadog.cluster_agent.datadog.rate_limit_queries.limit
(gauge)
Maximum number of queries to the Datadog API allowed in the period by endpoint
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.period
(gauge)
Period of rate limiting for the Datadog API by endpoint
Shown as second
datadog.cluster_agent.datadog.rate_limit_queries.remaining
(gauge)
Number of queries to the Datadog API remaining before next reset by endpoint
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.remaining_min
(gauge)
Minimum number of queries remaining before next reset observed during an expiration interval of 2*refresh period
Shown as query
datadog.cluster_agent.datadog.rate_limit_queries.reset
(gauge)
Number of seconds before next reset applied to the Datadog API by endpoint
Shown as second
datadog.cluster_agent.datadog.requests
(count)
Requests made to Datadog by status
Shown as request
datadog.cluster_agent.endpoint_checks.configs_dispatched
(gauge)
Number of endpoint-check configurations dispatched by node
datadog.cluster_agent.external_metrics
(gauge)
Number of external metrics tagged
datadog.cluster_agent.external_metrics.api_elapsed.count
(count)
Count of API Requests received
datadog.cluster_agent.external_metrics.api_elapsed.sum
(count)
Count of API Requests received
datadog.cluster_agent.external_metrics.api_requests
(gauge)
Count of API Requests received
datadog.cluster_agent.external_metrics.datadog_metrics
(gauge)
The label valid is true if the DatadogMetric CR is valid, false otherwise
datadog.cluster_agent.external_metrics.delay_seconds
(gauge)
Freshness of the metric evaluated from querying Datadog
Shown as second
datadog.cluster_agent.external_metrics.processed_value
(gauge)
Value processed from querying Datadog by metric
datadog.cluster_agent.go.goroutines
(gauge)
Number of goroutines that currently exist
datadog.cluster_agent.go.memstats.alloc_bytes
(gauge)
Number of bytes allocated and still in use
Shown as byte
datadog.cluster_agent.go.threads
(gauge)
Number of OS threads created
Shown as thread
datadog.cluster_agent.kubernetes_apiserver.emitted_events
(count)
Datadog events emitted by the kubernetes_apiserver check
datadog.cluster_agent.kubernetes_apiserver.kube_events
(count)
Kubernetes events processed by the kubernetes_apiserver check
datadog.cluster_agent.language_detection_dca_handler.processed_requests
(count)
The number of process language detection requests processed by the handler
datadog.cluster_agent.language_detection_patcher.patches
(count)
The number of patch requests sent by the patcher to the kube api server
datadog.cluster_agent.secret_backend.elapsed
(gauge)
The elapsed time of secret backend invocation
Shown as millisecond
datadog.cluster_agent.tagger.stored_entities
(gauge)
Number of entities stored in the tagger
datadog.cluster_agent.tagger.updated_entities
(count)
Number of updates made to entities in the tagger
datadog.cluster_agent.workloadmeta.events_received
(count)
Number of events received by workloadmeta
datadog.cluster_agent.workloadmeta.notifications_sent
(count)
Number of notifications sent by workloadmeta to its subscribers
datadog.cluster_agent.workloadmeta.stored_entities
(gauge)
Number of entities stored in workloadmeta
datadog.cluster_agent.workloadmeta.subscribers
(gauge)
Number of workloadmeta subscribers

Eventos

La integración Datadog Cluster Agent no incluye eventos.

Checks de servicios

datadog.cluster_agent.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint. Returns OK otherwise.
Statuses: ok, critical

Solucionar problemas

¿Necesitas ayuda? Consulta el servicio de asistencia de Datadog.

PREVIEWING: safchain/fix-custom-agent