Overview
This check monitors Flux through the Datadog Agent. Flux is a set of continuous and progressive delivery solutions for Kubernetes that is open and extensible.
Setup
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.
Installation
Starting from Agent release 7.51.0, the Fluxcd check is included in the Datadog Agent package. No additional installation is needed on your server.
For older versions of the Agent, use these steps to install the integration.
Configuration
This integration supports collecting metrics and logs from the following Flux services:
helm-controller
kustomize-controller
notification-controller
source-controller
You can pick and choose which services you monitor depending on your needs.
Metric collection
This is an example configuration with Kubernetes annotations on your Flux pods. See the sample configuration file for all available configuration options.
apiVersion: v1
kind: Pod
metadata:
name: '<POD_NAME>'
annotations:
ad.datadoghq.com/manager.checks: |-
{
"fluxcd": {
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8080/metrics"
}
]
}
}
# (...)
spec:
containers:
- name: 'manager'
# (...)
Log collection
Available for Agent versions >6.0
Flux logs can be collected from the different Flux pods through Kubernetes. Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.
See the Autodiscovery Integration Templates for guidance on applying the parameters below.
Parameter | Value |
---|
<LOG_CONFIG> | {"source": "fluxcd", "service": "<SERVICE_NAME>"} |
Validation
Run the Agent’s status subcommand and look for fluxcd
under the Checks section.
Data Collected
Metrics
fluxcd.controller.runtime.active.workers (gauge) | Number of currently used workers per controller. Shown as worker |
fluxcd.controller.runtime.max.concurrent.reconciles (gauge) | Maximum number of concurrent reconciles per controller. |
fluxcd.controller.runtime.reconcile.count (count) | Total number of reconciliations per controller. |
fluxcd.controller.runtime.reconcile.errors.count (count) | Total number of reconciliation errors per controller. Shown as error |
fluxcd.controller.runtime.reconcile.time.seconds.bucket (count) | Bucket of length of time per reconciliation per controller. |
fluxcd.controller.runtime.reconcile.time.seconds.count (count) | Count of length of time per reconciliation per controller. |
fluxcd.controller.runtime.reconcile.time.seconds.sum (count) | Sum of length of time per reconciliation per controller. Shown as second |
fluxcd.gotk.reconcile.condition (gauge) | The current condition status of a GitOps Toolkit resource reconciliation. |
fluxcd.gotk.reconcile.duration.seconds.bucket (count) | Bucket of the duration in seconds of a GitOps Toolkit resource reconciliation. |
fluxcd.gotk.reconcile.duration.seconds.count (count) | Count of the duration in seconds of a GitOps Toolkit resource reconciliation. |
fluxcd.gotk.reconcile.duration.seconds.sum (count) | Sum of the duration in seconds of a GitOps Toolkit resource reconciliation. Shown as second |
fluxcd.gotk.suspend.status (gauge) | The current suspend status of a GitOps Toolkit resource. |
fluxcd.leader_election_master_status (gauge) | Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Make sure to group by name. |
fluxcd.process.cpu_seconds.count (count) | Total user and system CPU time spent in seconds. Shown as second |
fluxcd.process.max_fds (gauge) | Maximum number of open file descriptors. |
fluxcd.process.open_fds (gauge) | Number of open file descriptors. |
fluxcd.process.resident_memory (gauge) | Resident memory size in bytes. Shown as byte |
fluxcd.process.start_time (gauge) | Start time of the process since unix epoch in seconds. Shown as second |
fluxcd.process.virtual_memory (gauge) | Virtual memory size in bytes. Shown as byte |
fluxcd.process.virtual_memory.max (gauge) | Maximum amount of virtual memory available in bytes. Shown as byte |
fluxcd.rest_client_requests.count (count) | Number of HTTP requests, partitioned by status code, method, and host. Shown as request |
fluxcd.workqueue.adds.count (count) | Total number of adds handled by a workqueue. |
fluxcd.workqueue.depth (gauge) | Current depth of a workqueue. |
fluxcd.workqueue.longest_running_processor (gauge) | The number of seconds that has the longest running processor for a workqueue that has been running. Shown as second |
fluxcd.workqueue.retries.count (count) | Total number of retries handled by workqueue. |
fluxcd.workqueue.unfinished_work (gauge) | The number of seconds of work that has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. Shown as second |
Events
The fluxcd integration does not include any events.
Service Checks
fluxcd.openmetrics.health
Returns CRITICAL
if the check cannot access the OpenMetrics metrics endpoint of Fluxcd.
Statuses: ok, critical
Troubleshooting
Need help? Contact Datadog support.
Further Reading
Additional helpful documentation, links, and articles: