Ingestion Controls

Ingestion Sampling Rules

Ingestion controls affect what traces are sent by your applications to Datadog. APM Metrics are always calculated based on all traces, and are not impacted by ingestion controls.

The Ingestion Control page provides visibility into the ingestion configuration of your applications and services. From the ingestion control page:

  • Gain visibility on your service-level ingestion configuration.
  • Adjust trace sampling rates for high throughput services or endpoints to better manage ingestion budget.
  • Adjust trace sampling rates for low throughput, rare traffic services or endpoints to increase visibility.
  • Understand which ingestion mechanisms are responsible for sampling most of your traces.
  • Investigate and act on potential ingestion configuration issues, such as limited CPU or RAM resources for the Agent.
Ingestion Control Page Overview

Understanding your ingestion configuration

Use the data in the ingestion control header to monitor your trace ingestion. The header displays the total amount of data ingested over the past hour, your estimated monthly usage, and the percentage of your allocated monthly ingestion limit, calculated based on your active APM infrastructure (such as hosts, Fargate tasks, and serverless functions).

If the monthly usage is under 100%, the projected ingested data fits within your monthly allotment. A monthly usage value over 100% means that the monthly ingested data is projected to be over your monthly allotment.

Ingestion levels by service

The service table contains information about the ingested volumes and ingestion configuration, broken down by service:

Type
The service type: web service, database, cache, browser, etc…
Name
The name of each service sending traces to Datadog. The table contains root and non-root services for which data was ingested in the past one hour.
Ingested Traces/s
Average number of traces per second ingested starting from the service over the past one hour.
Ingested Bytes/s
Average number of bytes per second ingested for the service over the past one hour.
Downstream Bytes/s
Average number of bytes per second ingested for which the service makes the sampling decision. This includes the bytes of all downstream services’ spans in the call stack that follow the decision made at the head of the trace. This column’s data is based on the sampling_service dimension, set on the datadog.estimated_usage.apm.ingested_bytes metrics. For more information, read APM usage metrics.
Traffic Breakdown
A detailed breakdown of traffic sampled and unsampled for traces starting from the service. See Traffic breakdown for more information.
Ingestion Configuration
Shows Automatic if the default head-based sampling mechanism from the Agent applies. If the ingestion was configured with trace sampling rules, the service is marked as Configured; a Local label is set when the sampling rule is applied from configuration in the tracing library, a Remote label is set when the sampling rule is applied remotely, from the UI. For more information about configuring ingestion for a service, read about changing the default ingestion rate.
Infrastructure
Hosts, containers, and functions on which the service is running.
Service status
Shows Limited Resource when some spans are dropped due to the Datadog Agent reaching CPU or RAM limits set in its configuration, Legacy Setup when some spans are ingested through the legacy App Analytics mechanism, or OK otherwise.

Filter the page by environment, configuration, and status to view services for which you need to take an action. To reduce the global ingestion volume, sort the table by the Downstream Bytes/s column to view services responsible for the largest share of your ingestion.

Note: The table is powered by the usage metrics datadog.estimated_usage.apm.ingested_spans and datadog.estimated_usage.apm.ingested_bytes. These metrics are tagged by service, env and ingestion_reason.

Traffic breakdown

The Traffic Breakdown column breaks down the destination of all traces starting from the service. It gives you an estimate of the share of traffic that is ingested and dropped, and for which reasons.

Traffic breakdown of trace ingestion

The breakdown is composed of the following parts:

  • Complete traces ingested (blue): The percentage of traces that have been ingested by Datadog.

  • Complete traces not retained (gray): The percentage of traces that have not been ingested by Datadog. Some traces might be dropped because:

    1. By default, the Agent automatically sets a sampling rate on services, depending on service traffic.
    2. The service is configured to ingest a certain percentage of traces using sampling rules.
  • Complete traces dropped by the tracer rate limiter (orange): When you choose to manually set the service ingestion rate as a percentage with trace sampling rules, a rate limiter is automatically enabled, set to 100 traces per second by default. See the rate limiter documentation to change this rate.

  • Traces dropped due to the Agent CPU or RAM limit (red): This mechanism may drop spans and create incomplete traces. To fix this, increase the CPU and memory allocation for the infrastructure that the Agent runs on.

Configuring ingestion for a service

Click on any service to view the Service Ingestion Summary, which provides actionable insights and configuration options for managing that service’s trace ingestion.

Ingestion configuration for a service

Sampling rates by resource

The table lists the applied sampling rates by resource of the service.

Sampling rates table by resource
  • The Ingested bytes column surfaces the ingested bytes from spans of the service and resource, while the Downstream bytes column surfaces the ingested bytes from spans where the sampling decision is made starting from that service and resource, including bytes from downstream services in the call chain.
  • The Configuration column surfaces where the resource sampling rate is being applied from:

Note: If the service is not making sampling decisions, the service’s resources will be collapsed under the Resources not making sampling decisions row.

Ingestion Reasons and sampling decision makers

Explore the Ingestion reasons breakdown to see which mechanisms are responsible for your service ingestion. Each ingestion reason relates to one specific ingestion mechanism. After changing your service ingestion configuration, you can observe the increase or decrease of ingested bytes and spans in this timeseries graph based on the past hour of ingested data.

If most of your service ingestion volume is due to decisions taken by upstream services, investigate the detail of the Sampling decision makers top list. For example, if your service is non-root, (meaning that it never decides to sample traces), observe all upstream services responsible for your non-root service ingestion. Configure upstream root services to reduce your overall ingestion volume.

For further investigations, use the APM Trace - Estimated Usage Dashboard, which provides global ingestion information as well as breakdown graphs by service, env and ingestion reason.

Agent and tracing library versions

See the Datadog Agent and tracing library versions your service is using. Compare the versions in use to the latest released versions to make sure you are running recent and up-to-date Agents and libraries.

Agent and tracing library versions

Note: You need to upgrade the Agent to v6.34 or v7.34 for the version information to be reported.

Configure the service ingestion rates by resource

Adaptive sampling is in Preview!

Adaptive sampling rates let Datadog control sampling rates on your behalf to match a configured monthly ingested volume budget. Follow the instructions in the Adaptive sampling guide to get started. To request access to the feature, complete the following form.

Request Access

To configure sampling rates for the service by resource name:

  1. Click Manage Ingestion rate.
    Configuration Modal
  2. Click Add new rule to set sampling rates for some resources. Sampling rules use glob pattern matching, so you can use wildcards (*) to match against multiple resources at the same time.

if the Remote configuration option is available

  1. Click Apply to save the configuration.

The configuration should take effect in less than a minute. You do not need to redeploy the service for the change to take effect. You can observe the configuration changes from the Live Search Explorer.

From the Service Ingestion Summary, resources for which the sampling rate are remotely applied should show as Remote Configured in the Configuration column.

if the Remote configuration option is disabled

If the remote configuration option is disabled, make sure that the listed requirements are all met to be able to use remote configuration.

  1. Apply the appropriate configuration generated from these choices to the indicated service and redeploy the service. Note: The service name value is case sensitive. It should match the case of your service name.

  2. Confirm on the Ingestion Control Page that your new percentage has been applied by looking at the Traffic Breakdown column, which surfaces the sampling rate applied. The resources for which the sampling rate was applied should show as Local Configured.

Remote configuration requirements

  • Datadog Agent 7.41.1 or higher.
  • Remote Configuration enabled for your Agent.
  • APM Remote Configuration Write permissions. If you don’t have these permissions, ask your Datadog admin to update your permissions from your organization settings.

Find below the minimum tracing library version required for the feature:

LanguageMinimum version required
Javav1.34.0
Gov1.64.0
Pythonv.2.9.0
Rubyv2.0.0
Node.jsv5.16.0
PHPv1.4.0
.NETv.2.53.2
C++v0.2.2

Managing Datadog Agent ingestion configuration

Click Configure Datadog Agent Ingestion to manage default head-based sampling rates, error sampling and rare sampling.

Agent Level Configuration Modal
  • Head-based Sampling: When no sampling rules are set for a service, the Datadog Agent automatically computes sampling rates to be applied for your services, targeting 10 traces per second per Agent. Change this target number of traces in Datadog, or set DD_APM_MAX_TPS locally at the Agent level.
  • Error Spans Sampling: For traces not caught by head-based sampling, the Datadog Agent catches local error traces up to 10 traces per second per Agent. Change this target number of traces in Datadog, or set DD_APM_ERROR_TPS locally at the Agent level.
  • Rare Spans Sampling: For traces not caught by head-based sampling, the Datadog Agent catches local rare traces up to 5 traces per second per Agent. This setting is disabled by default. Enable the collection of rare traces in Datadog, or set DD_APM_ENABLE_RARE_SAMPLER locally at the Agent level.

With remote configuration, you don’t have to restart the Agent to update these parameters. Click Apply to save the configuration changes, and the new configuration takes effect immediately. Remote configuration for Agent sampling parameters is available if you are using Agent version 7.42.0 or higher.

Note: The Other Ingestion Reasons (gray) section of the pie chart represents other ingestion reasons which are not configurable at the Datadog Agent level.

Note: Remotely configured parameters take precedence over local configurations such as environment variables and datadog.yaml configuration.

Sampling precedence rules

If sampling rules are set in multiple locations, the following precedence rules apply in order, where rules that appear first on the list can override lower precedence rules:

  1. Remotely configured sampling rules, set through resource-based sampling
  2. Adaptive sampling rules
  3. Locally configured sampling rules (DD_TRACE_SAMPLING_RULES)
  4. Remotely configured global sampling rate
  5. Locally configured global sampling rate (DD_TRACE_SAMPLE_RATE)
  6. rates from the trace agent controlled indirectly with Agent settings remotely or locally (DD_APM_MAX_TPS)

To phrase it another way, Datadog uses the following precedence rules:

  • Tracer settings > Agent settings
  • Sampling rules > Global sampling rate
  • Remote > Local

Further Reading

Additional helpful documentation, links, and articles:

PREVIEWING: rtrieu/product-analytics-ui-changes