Argo Workflows

Supported OS

Versión de la integración2.3.0

Información general

Este check supervisa Argo Workflows a través del Datadog Agent.

Configuración

Sigue las instrucciones a continuación para instalar y configurar este check para un Agent que se ejecuta en tu entorno de Kubernetes. Para más información sobre la configuración en entornos en contenedores, consulta las plantillas de integración de Autodiscovery para obtener orientación.

Instalación

A partir de la versión 7.53.0 del Agent, el check de Argo Workflows se incluye en el paquete del Datadog Agent. No es necesaria ninguna instalación adicional en tu entorno.

Este check utiliza OpenMetrics para recopilar métricas del endpoint de OpenMetrics.

Configuración

El controlador de flujos de trabajo de Argo Workflow dispone de métricas con formato Prometheus en /metrics en el puerto 9090. Para que el Agent empiece a recopilar métricas, los pods del controlador flujos de trabajo necesitan ser anotados. Para obtener más información sobre anotaciones, consulta las plantillas de integración de Autodiscovery como guía. Puedes encontrar opciones adicionales de configuración en el argo_workflows.d/conf.yaml de ejemplo.

El único parámetro necesario para configurar el check de Argo Workflows es:

  • openmetrics_endpoint: este parámetro debe definirse en la localización donde se exponen las métricas con formato Prometheus. El puerto predeterminado es 9090. En entornos en contenedores, %%host%% debe utilizarse para la autodetección de hosts.
apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/argo-workflows.checks: |
      {
        "argo_workflows": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:9090/metrics"
            }
          ]
        }
      }      
    # (...)
spec:
  containers:
    - name: 'argo-workflows'
# (...)

Recopilación de logs

Disponible para la versión 6.0 o posteriores del Agent

Los logs de Argo Workflows pueden recopilarse de los diferentes pods de Argo Workflows a través de Kubernetes. La recopilación de logs está desactivada por defecto en el Datadog Agent. Para habilitarla, consulta Recopilación de logs de Kubernetes.

Consulta las plantillas de integración de Autodiscovery para obtener orientación sobre la aplicación de los parámetros que se indican a continuación.

ParámetroValor
<LOG_CONFIG>{"source": "argo_workflows", "service": "<SERVICE_NAME>"}

Validación

Ejecuta el subcomando de estado del Agent y busca argo_workflows en la sección Checks.

Datos recopilados

Métricas

argo_workflows.cronworkflows.concurrencypolicy_triggered.count
(count)
Number of times concurrency policy triggered in cron workflows
argo_workflows.cronworkflows.triggered.count
(count)
Number of cron workflows triggered
argo_workflows.current_workflows
(gauge)
Number of Workflows currently accessible by the controller by status (refreshed every 15s)
argo_workflows.deprecated.feature
(gauge)
Indicates usage of deprecated features
argo_workflows.error.count
(count)
Number of errors encountered by the controller by cause
Shown as error
argo_workflows.go.gc.duration.seconds.count
(count)
The summary count of garbage collection cycles in the Argo Workflows instance
argo_workflows.go.gc.duration.seconds.quantile
(gauge)
The pause duration of garbage collection cycles in the Argo Workflows instance by quantile
argo_workflows.go.gc.duration.seconds.sum
(count)
The sum of the pause duration of garbage collection cycles in the Argo Workflows instance
Shown as second
argo_workflows.go.goroutines
(gauge)
Number of goroutines that currently exist.
argo_workflows.go.info
(gauge)
Information about the Go environment.
argo_workflows.go.memstats.alloc_bytes
(gauge)
Number of bytes allocated and still in use.
Shown as byte
argo_workflows.go.memstats.alloc_bytes.count
(count)
Total number of bytes allocated, even if freed.
Shown as byte
argo_workflows.go.memstats.buck_hash.sys_bytes
(gauge)
Number of bytes used by the profiling bucket hash table.
Shown as byte
argo_workflows.go.memstats.frees.count
(count)
Total number of frees.
argo_workflows.go.memstats.gc.sys_bytes
(gauge)
Number of bytes used for garbage collection system metadata.
Shown as byte
argo_workflows.go.memstats.heap.alloc_bytes
(gauge)
Number of heap bytes allocated and still in use.
Shown as byte
argo_workflows.go.memstats.heap.idle_bytes
(gauge)
Number of heap bytes waiting to be used.
Shown as byte
argo_workflows.go.memstats.heap.inuse_bytes
(gauge)
Number of heap bytes that are in use.
Shown as byte
argo_workflows.go.memstats.heap.objects
(gauge)
Number of allocated objects.
argo_workflows.go.memstats.heap.released_bytes
(gauge)
Number of heap bytes released to OS.
Shown as byte
argo_workflows.go.memstats.heap.sys_bytes
(gauge)
Number of heap bytes obtained from system.
Shown as byte
argo_workflows.go.memstats.last_gc_time_seconds
(gauge)
Number of seconds since 1970 of last garbage collection.
Shown as second
argo_workflows.go.memstats.lookups.count
(count)
Total number of pointer lookups.
argo_workflows.go.memstats.mallocs.count
(count)
Total number of mallocs.
argo_workflows.go.memstats.mcache.inuse_bytes
(gauge)
Number of bytes in use by mcache structures.
Shown as byte
argo_workflows.go.memstats.mcache.sys_bytes
(gauge)
Number of bytes used for mcache structures obtained from system.
Shown as byte
argo_workflows.go.memstats.mspan.inuse_bytes
(gauge)
Number of bytes in use by mspan structures.
Shown as byte
argo_workflows.go.memstats.mspan.sys_bytes
(gauge)
Number of bytes used for mspan structures obtained from system.
Shown as byte
argo_workflows.go.memstats.next.gc_bytes
(gauge)
Number of heap bytes when next garbage collection will take place.
Shown as byte
argo_workflows.go.memstats.other.sys_bytes
(gauge)
Number of bytes used for other system allocations.
Shown as byte
argo_workflows.go.memstats.stack.inuse_bytes
(gauge)
Number of bytes in use by the stack allocator.
Shown as byte
argo_workflows.go.memstats.stack.sys_bytes
(gauge)
Number of bytes obtained from system for stack allocator.
Shown as byte
argo_workflows.go.memstats.sys_bytes
(gauge)
Number of bytes obtained from system.
Shown as byte
argo_workflows.go.threads
(gauge)
Number of OS threads created.
argo_workflows.is_leader
(gauge)
Indicates if the current instance is the leader
argo_workflows.k8s_request.count
(count)
Number of kubernetes requests executed. https://argo-workflows.readthedocs.io/en/release-3.5/metrics/#argoworkflowsk8srequesttotal
Shown as request
argo_workflows.k8s_request.duration.bucket
(count)
Count of Kubernetes request durations split into buckets by upper bounds
argo_workflows.k8s_request.duration.count
(count)
Total count of Kubernetes request durations
argo_workflows.k8s_request.duration.sum
(count)
Sum of Kubernetes request durations
Shown as second
argo_workflows.log_messages.count
(count)
Total number of log messages.
Shown as message
argo_workflows.operation_duration_seconds.bucket
(count)
The count of observations in the histogram of durations of operations split into buckets by upper bound.
argo_workflows.operation_duration_seconds.count
(count)
The total count of observations in the histogram of durations of operations
argo_workflows.operation_duration_seconds.sum
(count)
Total time in seconds spent on operations
Shown as second
argo_workflows.pod.pending.count
(count)
Number of pending pods
argo_workflows.pods
(gauge)
Number of Pods from Workflows currently accessible by the controller by status (refreshed every 15s)
argo_workflows.pods_total.count
(count)
Total count of pods
argo_workflows.queue.duration.bucket
(count)
Count of queue durations split into buckets by upper bounds
argo_workflows.queue.duration.count
(count)
Total count of queue durations
argo_workflows.queue.duration.sum
(count)
Sum of queue durations
Shown as second
argo_workflows.queue.longest_running
(gauge)
Duration of the longest running queue
argo_workflows.queue.retries.count
(count)
Number of queue retries
argo_workflows.queue.unfinished_work
(gauge)
Unfinished work in the queue
argo_workflows.queue_adds.count
(count)
Adds to the queue
argo_workflows.queue_depth
(gauge)
Depth of the queue
argo_workflows.queue_latency.bucket
(count)
The count of observations for the time that objects spend waiting in the queue. Split into buckets by upper bounds
argo_workflows.queue_latency.count
(count)
The total count of observations for the time that objects spend waiting in the queue.
argo_workflows.queue_latency.sum
(count)
The total time that objects spend waiting in the queue.
Shown as second
argo_workflows.total.count
(count)
Total count of workflows
argo_workflows.version
(gauge)
Argo Workflows version
argo_workflows.workers_busy
(gauge)
Number of workers currently busy
Shown as worker
argo_workflows.workflow_condition
(gauge)
Workflow condition. https://argo-workflows.readthedocs.io/en/release-3.5/metrics/#argoworkflowsworkflow_condition
argo_workflows.workflows_processed.count
(count)
Number of workflow updates processed
argo_workflows.workflowtemplate.runtime
(gauge)
Runtime of the workflow template
argo_workflows.workflowtemplate.triggered.count
(count)
Number of times workflow templates triggered

Eventos

La integración Argo Workflows no incluye eventos.

Checks de servicio

argo_workflows.openmetrics.health
Returns CRITICAL if the check cannot access the OpenMetrics metrics endpoint of Argo Workflows.
Statuses: ok, critical

Solucionar problemas

¿Necesitas ayuda? Contacta con el equipo de asistencia de Datadog.

Referencias adicionales

Más enlaces, artículos y documentación útiles:

PREVIEWING: brett.blue/embedded-collector-release