Investigate Slow Traces or Endpoints

If your application is showing performance problems in production, integrating distributed tracing with code stack trace benchmarks from profiling is a powerful way to identify the performance bottlenecks. Application processes that have both APM distributed tracing and continuous profiler enabled are automatically linked.

You can move directly from span information to profiling data on the Profiles tab, and find specific lines of code related to performance issues. Similarly, you can also debug slow and resource consuming endpoints directly in the Profiling UI.

Identify code performance issues in slow traces


The Trace to Profiling integration is enabled when you:

  • Upgrade dd-trace-py to version 2.12.0+, 2.11.4+, or 2.10.7+.
  • Set environment variable DD_PROFILING_TIMELINE_ENABLED to true

Span execution timeline view

Profiles tab has a timeline view that breaks down threads and execution over time

The timeline view surfaces time-based patterns and work distribution over the period of the span.

With the span timeline view, you can:

  • Isolate time-consuming methods.
  • Sort out complex interactions between threads.
  • Surface runtime activity that impacted the request.

Depending on the runtime and language, the lanes vary:

See prerequisites to learn how to enable this feature for Python.

Each lane represents a thread. Threads from a common pool are grouped together. You can expand the pool to view details for each thread.

Viewing a profile from a trace

Opening a view of the profile in a flame graph

From the timeline, click Open in Profiling to see the same data on a new page. From there, you can change the visualization to a flame graph. Click the Focus On selector to define the scope of the data:

  • Span & Children scopes the profiling data to the selected span and all descendant spans in the same service.
  • Span only scopes the profiling data to the previously selected span.
  • Span time period scopes the profiling data to all threads during the time period the span was active.
  • Full profile scopes the data to 60 seconds of the whole service process that executed the previously selected span.

Break down code performance by API endpoints


Endpoint profiling is enabled by default when you turn on profiling for your Python service.

Requires dd-trace-py version 0.54.0+.

Endpoint profiling

Endpoint profiling allows you to scope your flame graphs by any endpoint of your web service to find endpoints that are slow, latency-heavy, and causing poor end-user experience. These endpoints can be tricky to debug and understand why they are slow. The slowness could be caused by an unintended large amount of resource consumption such as the endpoint consuming lots of CPU cycles.

With endpoint profiling you can:

  • Identify the bottleneck methods that are slowing down your endpoint’s overall response time.
  • Isolate the top endpoints responsible for the consumption of valuable resources such as CPU, memory, or exceptions. This is particularly helpful when you are generally trying to optimize your service for performance gains.
  • Understand if third-party code or runtime libraries are the reason for your endpoints being slow or resource-consumption heavy.
Troubleshooting a slow endpoint by using endpoint aggregation

Surface code that impacted your production latency

In the APM Service page, use the information in the Profiling tab to correlate a latency or throughput change to a code performance change.

In this example, you can see how latency is linked to a lock contention increase on /GET train that is caused by the following line of code:


Track endpoints that consume the most resources

It is valuable to track top endpoints that are consuming valuable resources such as CPU and wall time. The list can help you identify if your endpoints have regressed or if you have newly introduced endpoints that are consuming drastically more resources, slowing down your overall service.

The following image shows that GET /store_history is periodically impacting this service by consuming 20% of its CPU and 50% of its allocated memory:

Graphing top endpoints in terms of resource consumption

Track average resource consumption per request

Select Per endpoint call to see behavior changes even as traffic shifts over time. This is useful for progressive rollout sanity checks or analyzing daily traffic patterns.

The following example shows that CPU per request increased for /GET train:

Further reading

PREVIEWING: gorkavicente/appsec-serverless-library-compatibility