DogStatsD works by sending metrics generated from your application to the Agent over a transport protocol. This protocol can be UDP (User Datagram Protocol) or UDS (Unix Domain Socket).
When DogStatsD is used to send a large volume of metrics to a single Agent, if proper measures are not taken, it is common to end up with the following symptoms:
High Agent CPU usage
Dropped datagrams / metrics
The DogStatsD client library (UDS) returning errors
Most of the time the symptoms can be alleviated by tweaking some configuration options described below.
Some StatsD and DogStatsD clients, by default, send one metric per datagram. This adds considerable overhead on the client, the operating system, and the Agent. If your client supports buffering multiple metrics in one datagram, enabling this option brings noticeable improvements.
If you are using a community-supported DogStatsD client that supports buffering, make sure to configure a max datagram size that does not exceed the Agent-side per-datagram buffer size (8KB by default, configurable on the Agent with dogstatsd_buffer_size) and the network/OS max datagram size.
By default, Datadog’s official Golang library DataDog/datadog-go uses buffering. The size of each packet and the number of messages use different default values for UDS and UDP. See DataDog/datadog-go for more information about the client configuration.
packagemainimport("log""github.com/DataDog/datadog-go/v5/statsd")funcmain(){// In this example, metrics are buffered by default with the correct default configuration for UDP.
statsd,err:=statsd.New("127.0.0.1:8125")iferr!=nil{log.Fatal(err)}statsd.Gauge("example_metric.gauge",1,[]string{"env:dev"},1)}
By using Datadog’s official Python library datadogpy, the example below uses a buffered DogStatsD client that sends metrics in a minimal number of packets. With buffering automatic flushing is performed at packet size limit and every 300ms (configurable).
fromdatadogimportDogStatsd# If using client v0.43.0+dsd=DogStatsd(host="127.0.0.1",port=8125,disable_buffering=False)dsd.gauge('example_metric.gauge_1',123,tags=["environment:dev"])dsd.gauge('example_metric.gauge_2',1001,tags=["environment:dev"])dsd.flush()# Optional manual flush# If using client before v0.43.0, context manager is needed to use bufferingdsd=DogStatsd(host="127.0.0.1",port=8125)withdsd:dsd.gauge('example_metric.gauge_1',123,tags=["environment:dev"])dsd.gauge('example_metric.gauge_2',1001,tags=["environment:dev"])
By default, Python DogStatsD client instances (including the statsd global instance) cannot be shared across processes but are thread-safe. Because of this, the parent process and each child process must create their own instances of the client or the buffering must be explicitly disabled by setting disable_buffering to True. See the documentation on datadog.dogstatsd for more details.
By using Datadog’s official Ruby library [dogstatsd-ruby][1], the example below creates a buffered DogStatsD client instance that sends metrics in one packet when the flush is triggered:
By using Datadog’s official Java library java-dogstatsd-client, the example below creates a buffered DogStatsD client instance with a maximum packet size of 1500 bytes, meaning all metrics sent from this instance of the client are buffered and sent in packets of 1500 packet-length at most:
By using Datadog’s official C# library dogstatsd-csharp-client, the example below creates a DogStatsD client with UDP as transport:
usingStatsdClient;publicclassDogStatsdClient{publicstaticvoidMain(){vardogstatsdConfig=newStatsdConfig{StatsdServerName="127.0.0.1",StatsdPort=8125,};using(vardogStatsdService=newDogStatsdService()){if(!dogStatsdService.Configure(dogstatsdConfig))thrownewInvalidOperationException("Cannot initialize DogstatsD. Set optionalExceptionHandler argument in the `Configure` method for more information.");// Counter and Gauge are sent in the same datagramdogStatsdService.Counter("example_metric.count",2,tags:new[]{"environment:dev"});dogStatsdService.Gauge("example_metric.gauge",100,tags:new[]{"environment:dev"});}}}
By using Datadog’s official PHP library php-datadogstatsd, the example below creates a buffered DogStatsD client instance that sends metrics in one packet when the block completes:
It is possible to reduce the traffic from your DogStatsD client to the Agent by setting a sample rate value for your client. For example, a sample rate of 0.5 halves the number of UDP packets sent. This solution is a trade-off: you decrease traffic but slightly lose in precision and granularity.
UDS is an inter-process communication protocol used to transport DogStatsD payloads. It has little overhead when compared to UDP and lowers the general footprint of DogStatsD on your system.
Client libraries can aggregate metrics on the client side, reducing number of messages that have to be submitted to the Datadog Agent, improving IO performance and throughput.
Client-side aggregation is available in DogStatsD C# client v7.0.0+ and is enabled by default. Client-side aggregation is available for gauges, counters, and sets.
If your DogStatsD server uses UDS and is dropping packets at a high throughput, configuring the server to use more CPU may improve processing speed and decrease packet drops.
You may also configure your DogStatsD server if the client telemetry indicates packet drops and the server does not use more than 2 CPUs or 2 cores even if they are available.
To reduce the amount of packet drops:
Increase the client queue size to 8192. For more information, see the client library configuration. You may see the amount of drops decrease, and your application may use more RAM.
Additionally, you can enable the dogstatsd_pipeline_autoadjust: true feature in your Datadog Agent configuration. The Agent uses multiple cores to process custom metrics, which may lead to higher CPU usage but lowers packet drops.
Most operating systems add incoming UDP and UDS datagrams containing your metrics to a buffer with a maximum size. Once the max is reached, datagrams containing your metrics start getting dropped. It is possible to adjust the values to give the Agent more time to process incoming metrics:
On most Linux distributions, the maximum size of the kernel buffer is set to 212992 by default (208 KiB). This can be confirmed using the following commands:
For UDS sockets, Linux is internally buffering datagrams in a queue if the reader is slower than the writer. The size of this queue represents the maximum number of datagrams that Linux buffers per socket. This value can be queried with the following command:
sysctl net.unix.max_dgram_qlen
If the value is < 512, you can increase it to 512 or more using this command:
sysctl -w net.unix.max_dgram_qlen=512
Add the following configuration to /etc/sysctl.conf to make this change permanent:
net.unix.max_dgram_qlen = 512
In the same manner, the net.core.wmem_max could be incremented to 4MiB to improve
client writing performances:
net.core.wmem_max = 4194304
Then set the Agent dogstatsd_so_rcvbuf configuration option to the same number in datadog.yaml:
If you are using Kubernetes to deploy the Agent and/or DogStatsD and you want to configure the sysctls as mentioned above, set their value per container. If the net.* sysctls is namespaced, you can set them per pod. See the Kubernetes documentation on Using sysctls in a Kubernetes Cluster.
Avoid extra CPU usage by sending packets with an adequate size to the DogStatsD server in the Datadog Agent. The latest versions of the official DogStatsD clients send packets with a size optimized for performance.
You can skip this section if you are using one of the latest Datadog DogStatsD clients.
If the packets sent are too small, the Datadog Agent packs several together to process them in batches later in the pipeline. The official DogStatsD clients are capable of grouping metrics to have the best ratio of the number of metrics per packet.
The Datadog Agent performs most optimally if the DogStatsD clients send packets the size of the dogstatsd_buffer_size. The packets must not be larger than the buffer size, otherwise, the Agent can’t load them completely in the buffer without the metrics being malformed. Use the corresponding configuration field in your DogStatsD clients.
Note for UDP: Because UDP packets usually go through the Ethernet and IP layer, you can avoid IP packets fragmentation by limiting the packet size to a value lower than a single Ethernet frame on your network. Most of the time, IPv4 networks are configured with a MTU of 1500 bytes, so in this situation the packet size of sent packets should be limited to 1472.
Note for UDS: for the best performances, the UDS packet size should be 8192 bytes.
The Agent tries to absorb the burst of metrics sent by the DogStatsD clients, but to do so, it needs to use memory. Even if this is for a short amount of time and even if this memory is quickly released to the OS, a spike happens and that could be an issue in containerized environments where limit on memory usage could evict pods or containers.
Avoid sending metrics in bursts in your application - this prevents the Datadog Agent from reaching its maximum memory usage.
Another thing to look at to limit the maximum memory usage is to reduce the buffering. The main buffer of the DogStatsD server within the Agent is configurable with the dogstatsd_queue_size field (since Datadog Agent 6.1.0), its default value of 1024 induces an approximate maximum memory usage of 768MB.
Note: Reducing the buffer size could increase the number of packet drops.
This example decreases the max memory usage of DogStatsD to approximately 384MB:
dogstatsd_queue_size:512
See the next section on burst detection to help you detect bursts of metrics from your applications.
DogStatsD has a stats mode in which you can see which metrics are the most processed.
Note: Enabling metrics stats mode can decrease DogStatsD performance.
To enable the stats mode, you can either:
Set dogstatsd_stats_enable to true in your configuration file
Set the environment variable DD_DOGSTATSD_STATS_ENABLE to true
Use the datadog-agent config set dogstatsd_stats true command to enable it at runtime. You can disable it at runtime using the command datadog-agent config set dogstatsd_stats false.
When this mode is enabled, run the command datadog-agent dogstatsd-stats. A list of the processed metrics is returned in descending order by the metrics received.
While running in this mode, the DogStatsD server runs a burst detection mechanism. If a burst is detected, a warning log is emitted. For example:
A burst of metrics has been detected by DogStatSd: here is the last 5 seconds count of metrics: [250 230 93899 233 218]
DogStatsD clients send telemetry metrics by default to the Agent. This allows you to better troubleshoot where bottlenecks exist. Each metric is tagged with the client language and the client version. These metrics are not counted as custom metrics.
Each client shares a set of common tags.
Tag
Description
Example
client
The language of the client
client:py
client_version
The version of the client
client_version:1.2.3
client_transport
The transport used by the client (udp or uds)
client_transport:uds
Note: When using UDP, network errors can’t be detected by the client and the corresponding metrics do not reflect byte or packet drops.
Metric type: count The number of metrics sent to the DogStatsD client by your application (before sampling and aggregation).
datadog.dogstatsd.client.metrics_by_type
Metric type: count The number of metrics sent by the DogStatsD client, before sampling and aggregation, tagged by metric type (gauge,
set, count, timing, histogram, or distribution). Starting with v5.0.0 of the Go client.
datadog.dogstatsd.client.events
Metric type: count The number of events sent to the DogStatsD client by your application.
datadog.dogstatsd.client.service_checks
Metric type: count The number of service_checks sent to the DogStatsD client by your application.
datadog.dogstatsd.client.bytes_sent
Metric type: count The number of bytes successfully sent to the Agent.
datadog.dogstatsd.client.bytes_dropped
Metric type: count The number of bytes dropped by the DogStatsD client (this includes datadog.dogstatsd.client.bytes_dropped_queue and
datadog.dogstatsd.client.bytes_dropped_writer).
datadog.dogstatsd.client.bytes_dropped_queue
Metric type: count The number of bytes dropped because the DogStatsD client queue was full.
datadog.dogstatsd.client.bytes_dropped_writer
Metric type: count The number of bytes dropped because of an error while writing to Datadog due to network timeout or error.
datadog.dogstatsd.client.packets_sent
Metric type: count The number of datagrams successfully sent to the Agent.
datadog.dogstatsd.client.packets_dropped
Metric type: count The number of datagrams dropped by the DogStatsD client (this includes datadog.dogstatsd.client.packets_dropped_queue
and datadog.dogstatsd.client.packets_dropped_writer).
datadog.dogstatsd.client.packets_dropped_queue
Metric type: count The number of datagrams dropped because the DogStatsD client queue was full.
datadog.dogstatsd.client.packets_dropped_writer
Metric type: count The number of datagrams dropped because of an error while writing to Datadog due to network timeout or error.
Metric type: count The number of metrics dropped because the internal receiving channel is full (when using WithChannelMode()). Starting
with v3.6.0 of the Go client when WithChannelMode() is enabled.
datadog.dogstatsd.client.aggregated_context
Metric type: count The total number of contexts flushed by the client when client side aggregation is enabled. Starting v5.0.0 of the Go
client. This metric is reported only when the aggregation is enabled (which is the default).
Metric type: count The total number of contexts flushed by the client, when client-side aggregation is enabled, tagged by metric type
(gauge, set, count, timing, histogram, or distribution). Starting v5.0.0 of the Go client. This metric is
reported only when the aggregation is enabled (which is the default).
To disable telemetry, use the WithoutTelemetry setting:
Metric type: count The number of contexts aggregated by type when client side aggregation is enabled. Starting with version v2.13.0. The metric is enabled by default starting v3.0.0 but requires enableDevMode(true) for v2.13.0. The metric is tagged by metrics_type.
datadog.dogstatsd.client.metrics_by_type
Metric type: count The number of metrics sent to the DogStatsD client by your application tagged by type (before sampling). Starting with version v2.13.0 when enableDevMode(true) is used and by default starting v3.0.0. The metric is tagged by metrics_type.
To disable telemetry, use the enableTelemetry(false) builder option:
Starting with version 1.5.0 of the PHP client the telemetry is enabled by
default for the BatchedDogStatsd client and disabled by default for the
DogStatsd client.
datadog.dogstatsd.client.metrics
Metric type: count The number of metrics sent to the DogStatsD client by your application (before sampling).
datadog.dogstatsd.client.events
Metric type: count The number of events sent to the DogStatsD client by your application.
datadog.dogstatsd.client.service_checks
Metric type: count The number of service_checks sent to the DogStatsD client by your application.
datadog.dogstatsd.client.bytes_sent
Metric type: count The number of bytes successfully sent to the Agent.
datadog.dogstatsd.client.bytes_dropped
Metric type: count The number of bytes dropped by the DogStatsD client.
datadog.dogstatsd.client.packets_sent
Metric type: count The number of datagrams successfully sent to the Agent.
datadog.dogstatsd.client.packets_dropped
Metric type: count The number of datagrams dropped by the DogStatsD client.
To enable or disable telemetry use the disable_telemetry argument. Beware,
using telemetry with the DogStatsd client increases network usage
significantly. It is advised to use the BatchedDogStatsd when using telemetry.