TCP Queue Length

Supported OS Linux

Overview

This check monitors the usage of the Linux TCP receive and send queues. It can detect if a TCP receive or send queue is full for individual containers.

Setup

Installation

tcp_queue_length is a core Agent 6/7 check that relies on an eBPF part implemented in system-probe. Agent version 7.24.1/6.24.1 or above is required.

The eBPF program used by system-probe is compiled at runtime and requires you to have access to the proper kernel headers.

On Debian-like distributions, install the kernel headers like this:

apt install -y linux-headers-$(uname -r)

On RHEL-like distributions, install the kernel headers like this:

yum install -y kernel-headers-$(uname -r)
yum install -y kernel-devel-$(uname -r)

Note: Windows and CentOS/RHEL versions earlier than 8 are not supported.

Configuration

Enabling the tcp_queue_length integration requires both the system-probe and the core agent to have the configuration option enabled.

Inside the system-probe.yaml configuration file, the following parameters must be set:

system_probe_config:
  enable_tcp_queue_length: true
  1. Edit the tcp_queue_length.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your tcp_queue_length performance data. See the sample tcp_queue_length.d/conf.yaml for all available configuration options.

  2. Restart the Agent.

Configuration with Helm

With the Datadog Helm chart, the system-probe must be activated by setting datadog.systemProbe.enabled to true in the values.yaml file. Then, the check can be activated by setting the datadog.systemProbe.enableTCPQueueLength parameter.

Configuration with the Operator (v1.0.0+)

Set the features.tcpQueueLength.enabled parameter in the DatadogAgent manifest:

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  features:
    tcpQueueLength:
      enabled: true

Note: When using COS (Container Optimized OS), override the src volume in the node Agent:

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  features:
    tcpQueueLength:
      enabled: true
  override:
    nodeAgent:
      volumes: 
      - emptyDir: {}
        name: src

Validation

Run the Agent’s status subcommand and look for tcp_queue_length under the checks section.

Data Collected

Metrics

tcp_queue.read_buffer_max_usage_pct
(gauge)
Maximum usage of read buffer in percent across all open connections
Shown as percent
tcp_queue.write_buffer_max_usage_pct
(gauge)
Maximum usage of write buffer in percent across all open connections
Shown as percent

Service Checks

The TCP Queue Length check does not include any service checks.

Events

The TCP Queue Length check does not include any events.

Troubleshooting

Need help? Contact Datadog support.

PREVIEWING: rtrieu/product-analytics-ui-changes