Linkerd

Supported OS Linux Windows Mac OS

Integration version6.1.0

Linkerd Integration

Overview

Linkerd is a light but powerful open-source service mesh with CNCF graduated status. It provides the tools you need to write secure, reliable, observable cloud-native applications. With minimal configuration and no application changes, Linkerd:

Uses mutual TLS to transparently secure all on-cluster TCP communication.
Adds latency-aware load balancing, request retries, timeouts, and blue-green deploys to keep your applications resilient.
Provides platform health metrics by tracking success rates, latencies, and request volumes for every meshed workload.

This integration sends your Linkerd metrics to Datadog, including application success rates, latency, and saturation.

Setup

This OpenMetrics-based integration has a latest mode (enabled by setting openmetrics_endpoint to point to the target endpoint) and a legacy mode (enabled by setting prometheus_url instead). To get all the most up-to-date features, Datadog recommends enabling the latest mode. For more information, see Latest and Legacy Versioning For OpenMetrics-based Integrations.

Metrics marked as [OpenMetrics V1] or [OpenMetrics V2] are only available using the corresponding mode of the Linkerd integration. Metrics not marked are collected by all modes.

Installation

The Linkerd check is included in the Datadog Agent package, so you don’t need to install anything else on your server.

Configuration

Host

To configure this check for an Agent running on a host:

Edit the linkerd.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory. See the sample linkerd.d/conf.yaml for all available configuration options using the latest OpenMetrics check example. If you previously implemented this integration, see the legacy example.
Restart the Agent.

Containerized

For containerized environments, see the Autodiscovery Integration Templates for guidance on applying the parameters below.

Linkerd v1

Parameter	Value
`<INTEGRATION_NAME>`	`linkerd`
`<INIT_CONFIG>`	blank or `{}`
`<INSTANCE_CONFIG>`	`{"openmetrics_endpoint": "http://%%host%%:9990/admin/metrics/prometheus"}`

Note: This is a new default OpenMetrics check example. If you previously implemented this integration, see the legacy example.

Linkerd v2

Parameter	Value
`<INTEGRATION_NAME>`	`linkerd`
`<INIT_CONFIG>`	blank or `{}`
`<INSTANCE_CONFIG>`	`{"openmetrics_endpoint": "http://%%host%%:4191/metrics"}`

Note: This is a new default OpenMetrics check example. If you previously implemented this integration, see the legacy example.

Log collection

Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes log collection.

Parameter	Value
`<LOG_CONFIG>`	`{"source": "linkerd", "service": "<SERVICE_NAME>"}`

To increase the verbosity of the data plane logs, see Modifying the Proxy Log Level.

Validation

Run the Agent’s status subcommand and look for linkerd under the Checks section.

Data Collected

Metrics


linkerd.control.request.count (count)	[OpenMetrics V2] Total count of control HTTP requests. Shown as request
linkerd.control.request_total (count)	[OpenMetrics V1] Total count of control HTTP requests. Shown as request
linkerd.control.response.count (count)	[OpenMetrics V2] Total count of control HTTP responses. Shown as response
linkerd.control.response_latency.count (gauge)	Number of control responses on which the linkerd.control.response_latency.sum is evaluated. Shown as response
linkerd.control.response_latency.sum (gauge)	Elapsed times between a control request’s headers being received and its response stream completing. Shown as millisecond
linkerd.control.response_total (count)	[OpenMetrics V1] Total count of control HTTP responses. Shown as response
linkerd.control.retry_skipped.count (count)	[OpenMetrics V2] Total count of retryable control HTTP responses that were not retried. Shown as response
linkerd.control.retry_skipped_total (count)	[OpenMetrics V1] Total count of retryable control HTTP responses that were not retried. Shown as response
linkerd.jvm.fd_count (gauge)	(only available on Unix-based OS) A gauge of the number of open file descriptors (Linkerd v1 only). Shown as unit
linkerd.jvm.gc.ConcurrentMarkSweep.cycles (gauge)	A gauge for ConcurrentMarkSweep of the total number of collections that have occurred (Linkerd v1 only). Shown as unit
linkerd.jvm.gc.ConcurrentMarkSweep.msec (gauge)	A gauge for ConcurrentMarkSweep of the total elapsed time garbage collection pool doing collections, in milliseconds (Linkerd v1 only). Shown as millisecond
linkerd.jvm.gc.ParNew.cycles (gauge)	A gauge for ParNew of the total number of collections that have occurred (Linkerd v1 only). Shown as unit
linkerd.jvm.gc.ParNew.msec (gauge)	A gauge for ParNew of the total elapsed time garbage collection pool doing collections, in milliseconds (Linkerd v1 only). Shown as millisecond
linkerd.jvm.gc.cycles (gauge)	A gauge of the number of the total number of collections that have occurred (Linkerd v1 only). Shown as unit
linkerd.jvm.gc.eden.pause_msec.quantile (gauge)	Stats of the durations, in milliseconds, of the eden collection pauses (Linkerd v1 only). Shown as millisecond
linkerd.jvm.gc.msec (gauge)	A gauge of the total elapsed time doing collections, in milliseconds (Linkerd v1 only). Shown as millisecond
linkerd.jvm.heap.committed (gauge)	For the heap used for object allocation, a gauge of the amount of memory, in bytes, committed for the JVM to use (Linkerd v1 only). Shown as byte
linkerd.jvm.heap.max (gauge)	For the heap used for object allocation, a gauge of the maximum amount of memory, in bytes, that can be used by the JVM (Linkerd v1 only). Shown as byte
linkerd.jvm.heap.used (gauge)	For the heap used for object allocation, a gauge of the current amount of memory used, in bytes (Linkerd v1 only). Shown as byte
linkerd.jvm.mem.current.CMS_Old_Gen.used (gauge)	A gauge of the of the current memory used, in bytes, for CMS_Old_Gen memory pool (Linkerd v1 only). Shown as byte
linkerd.jvm.mem.current.Par_Eden_Space.used (gauge)	A gauge of the of the current memory used, in bytes, for Par_Eden_Space memory pool (Linkerd v1 only). Shown as byte
linkerd.jvm.mem.current.Par_Survivor_Space.used (gauge)	A gauge of the of the current memory used, in bytes, for Par_Survivor_Space memory pool (Linkerd v1 only). Shown as byte
linkerd.jvm.nonheap.committed (gauge)	For the non-heap memory, a gauge of the amount of memory, in bytes, committed for the JVM to use (Linkerd v1 only). Shown as byte
linkerd.jvm.nonheap.max (gauge)	For the non-heap memory, a gauge of the maximum amount of memory, in bytes, that can be used by the JVM (Linkerd v1 only). Shown as byte
linkerd.jvm.nonheap.used (gauge)	For the non-heap memory, a gauge of the current amount of memory used, in bytes (Linkerd v1 only). Shown as byte
linkerd.jvm.num_cpus (gauge)	A gauge of the number of processors available to the JVM (Linkerd v1 only). Shown as core
linkerd.jvm.start_time (gauge)	A gauge of the start time of the Java virtual machine in milliseconds since the epoch (Linkerd v1 only). Shown as millisecond
linkerd.jvm.thread.count (gauge)	A gauge of the number of live threads including both daemon and non-daemon threads (Linkerd v1 only). Shown as thread
linkerd.jvm.uptime (gauge)	A gauge of the uptime of the Java virtual machine in milliseconds (Linkerd v1 only). Shown as millisecond
linkerd.openmetrics.health (gauge)	[OpenMetrics V2] Whether the check is able to connect to the metrics endpoint.
linkerd.process.cpu_seconds.count (count)	[OpenMetrics V2] Total user and system CPU time spent in seconds. Shown as second
linkerd.process.cpu_seconds_total (count)	[OpenMetrics V1] Total user and system CPU time spent in seconds. Shown as second
linkerd.process.max_fds (gauge)	Maximum number of open file descriptors. Shown as file
linkerd.process.open_fds (gauge)	Number of open file descriptors. Shown as file
linkerd.process.resident_memory (gauge)	Resident memory size in bytes. Shown as byte
linkerd.process.start_time (gauge)	Time that the process started (in seconds since the UNIX epoch). Shown as second
linkerd.process.virtual_memory (gauge)	Virtual memory size in bytes. Shown as byte
linkerd.prometheus.health (gauge)	Whether the check is able to connect to the metrics endpoint.
linkerd.request.count (count)	[OpenMetrics V2] Total count of HTTP requests. Shown as request
linkerd.request_total (count)	[OpenMetrics V1] Total count of HTTP requests. Shown as request
linkerd.response.count (count)	[OpenMetrics V2] Total count of HTTP responses. Shown as response
linkerd.response_latency.count (gauge)	Number of responses on which the linkerd.response_latency.sum metric is evaluated. Shown as response
linkerd.response_latency.sum (gauge)	Elapsed times between a request’s headers being received and its response stream completing. Shown as millisecond
linkerd.response_total (count)	[OpenMetrics V1] Total count of HTTP responses. Shown as response
linkerd.retry_skipped.count (count)	[OpenMetrics V2] Total count of retryable HTTP responses that were not retried. Shown as response
linkerd.retry_skipped_total (count)	[OpenMetrics V1] Total count of retryable HTTP responses that were not retried. Shown as response
linkerd.route.actual_request.count (count)	[OpenMetrics V2] Total count of actual route HTTP requests. Shown as request
linkerd.route.actual_request_total (count)	[OpenMetrics V1] Total count of actual route HTTP requests. Shown as request
linkerd.route.actual_response.count (count)	[OpenMetrics V2] Total count of actual route HTTP responses. Shown as response
linkerd.route.actual_response_latency.count (gauge)	Number of responses on which the linkerd.route.actual_response_latency.sum is evaluated. Shown as millisecond
linkerd.route.actual_response_latency.sum (gauge)	Elapsed times between a actual route request’s headers being received and its response stream completing. Shown as millisecond
linkerd.route.actual_response_total (count)	[OpenMetrics V1] Total count of actual route HTTP responses. Shown as response
linkerd.route.actual_retry_skipped.count (count)	[OpenMetrics V2] Total count of retryable actual route HTTP responses that were not retried. Shown as response
linkerd.route.actual_retry_skipped_total (count)	[OpenMetrics V1] Total count of retryable actual route HTTP responses that were not retried. Shown as response
linkerd.route.request.count (count)	[OpenMetrics V2] Total count of route HTTP requests. Shown as request
linkerd.route.request_total (count)	[OpenMetrics V1] Total count of route HTTP requests. Shown as request
linkerd.route.response.count (count)	[OpenMetrics V2] Total count of route HTTP responses. Shown as response
linkerd.route.response_latency.count (gauge)	Number of responses on which the linkerd.route.response_latency.sum metric is evaluated. Shown as response
linkerd.route.response_latency.sum (gauge)	Elapsed times between a route request’s headers being received and its response stream completing. Shown as millisecond
linkerd.route.response_total (count)	[OpenMetrics V1] Total count of route HTTP responses. Shown as response
linkerd.route.retry_skipped.count (count)	[OpenMetrics V2] Total count of retryable route HTTP responses that were not retried. Shown as response
linkerd.route.retry_skipped_total (count)	[OpenMetrics V1] Total count of retryable route HTTP responses that were not retried. Shown as response
linkerd.rt.client.connections (rate)	Number of active connections for the client (Linkerd v1 only). Shown as connection
linkerd.rt.client.connects_s (rate)	Number of connection par second for the client (Linkerd v1 only). Shown as connection
linkerd.rt.client.pool_cached (gauge)	A gauge of the number of connections cached for the client (Linkerd v1 only). Shown as connection
linkerd.rt.client.pool_num_too_many_waiters (gauge)	A counter of the number of times there were no connections immediately available and there were already too many waiters (Linkerd v1 only). Shown as unit
linkerd.rt.client.pool_num_waited (gauge)	A counter of the number of times there were no connections immediately available and the client waited for a connection (Linkerd v1 only). Shown as unit
linkerd.rt.client.pool_size (gauge)	A gauge of the number of connections that are currently alive, either in use or not (Linkerd v1 only). Shown as connection
linkerd.rt.client.pool_waiters (gauge)	A gauge of the number of clients waiting on connections (Linkerd v1 only). Shown as unit
linkerd.rt.client.request_latency_ms.quantile (gauge)	Stats of the latency of requests in milliseconds for the client (Linkerd v1 only). Shown as millisecond
linkerd.rt.client.requests_s (rate)	Number of requests by second received by the client (Linkerd v1 only).
linkerd.rt.client.status.1XX_s (rate)	Number of request by second returning 1XX status code for the client (Linkerd v1 only). Shown as unit
linkerd.rt.client.status.2XX_s (rate)	Number of request by second returning 2XX status code for the client (Linkerd v1 only). Shown as unit
linkerd.rt.client.status.3XX_s (rate)	Number of request by second returning 3XX status code for the client (Linkerd v1 only). Shown as unit
linkerd.rt.client.status.4XX_s (rate)	Number of request by second returning 4XX status code for the client (Linkerd v1 only). Shown as unit
linkerd.rt.client.status.5XX_s (rate)	Number of request by second returning 5XX status code for the client (Linkerd v1 only). Shown as unit
linkerd.rt.client.success_s (rate)	Number of success per second for the client (Linkerd v1 only).
linkerd.rt.server.connections (gauge)	Number of active connections for the server (Linkerd v1 only). Shown as connection
linkerd.rt.server.connects_s (rate)	Number of connection par second for the server (Linkerd v1 only). Shown as connection
linkerd.rt.server.request_latency_ms.quantile (gauge)	Stats of the latency of requests in milliseconds for the server (Linkerd v1 only). Shown as millisecond
linkerd.tcp.close.count (count)	[OpenMetrics V2] Total count of closed connections. Shown as connection
linkerd.tcp.close_total (count)	[OpenMetrics V1] Total count of closed connections. Shown as connection
linkerd.tcp.connection_duration.count (gauge)	Number of connections on which the linkerd.tcp.connection_duration.sum metric is evaluated. Shown as connection
linkerd.tcp.connection_duration.sum (gauge)	Connection lifetimes. Shown as millisecond
linkerd.tcp.open.count (count)	[OpenMetrics V2] Total count of opened connections. Shown as connection
linkerd.tcp.open_connections (gauge)	Number of currently-open connections. Shown as connection
linkerd.tcp.open_total (count)	[OpenMetrics V1] Total count of opened connections. Shown as connection
linkerd.tcp.read_bytes.count (count)	[OpenMetrics V2] Total count of bytes read from peers. Shown as byte
linkerd.tcp.read_bytes_total (count)	[OpenMetrics V1] Total count of bytes read from peers. Shown as byte
linkerd.tcp.write_bytes.count (count)	[OpenMetrics V2] Total count of bytes written to peers. Shown as byte
linkerd.tcp.write_bytes_total (count)	[OpenMetrics V1] Total count of bytes written to peers. Shown as byte

Service Checks

linkerd.prometheus.health

Returns CRITICAL if the agent fails to connect to the prometheus endpoint, otherwise OK.

Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.