Exception Replay in Error Tracking

Exception Replay for APM Error Tracking is generally available for Python, and in Preview for Java, .NET, and PHP.

Overview

Exception Replay in APM Error Tracking automatically captures production variable values to help you reproduce exceptions from Error Tracking issues.

Error Tracking Explorer Exception Replay

Requirements

Supported languages
Python, Java, .NET, PHP
  • Your Datadog Agent must be configured for APM.
  • Your application must be instrumented with:
    • ddtrace for Python
    • dd-trace-java for Java
    • dd-trace-dotnet for .NET
    • dd-trace-php for PHP

Exception Replay is only available in APM Error Tracking. It is not available for errors sourced from Logs and RUM.

Setup

  1. Upgrade the Datadog Agent to version 7.44.0 or higher.
  2. Upgrade the APM tracer library to the minimum required version or higher:
    • ddtrace version 1.16.0+
    • dd-trace-java version 1.47.0+
    • dd-trace-dotnet version 2.53.0+
    • dd-trace-php version 1.5.0+
  3. Run your service with the DD_EXCEPTION_REPLAY_ENABLED environment variable set to true.
  4. Create a logs index and configure it to the desired retention with no sampling.
    • Set the filter to match on the source:dd_debugger tag.
    • Ensure that the new index takes precedence over any others with filters that match that tag, because the first match wins.
Why create a logs index? When an error occurs and is captured in an APM span, Exception Replay variable snapshots are captured as logs with reference links to the APM span. When viewing the error in Error Tracking Explorer, variable snapshots from the log data display alongside stack trace details.

Redacting sensitive data

By default, Exception Replay automatically redacts variable data linked to sensitive identifiers like password and accessToken. See the full list of redacted identifiers.

Scrub Exception Replay variable snapshots for PII and other sensitive data by:

For more information, see Dynamic Instrumentation Sensitive Data Scrubbing.

Note: Dynamic Instrumentation is NOT a prerequisite for Sensitive Data Scrubbing. Sensitive Data Scrubbing applies to Exception Replay variable snapshots by default regardless of whether Dynamic Instrumentation is enabled on the service.

Getting started

  1. Navigate to APM > Error Tracking.
  2. Click an Error Tracking issue on a service with Exception Replay enabled.
  3. Scroll down to the stack trace component.
  4. Expand stack frames to examine captured variable values.

Troubleshooting

A specific error trace does not have variable values

Exception Replay variable snapshots are rate limited to ensure negligible impact on application performance. For a given exception or issue, a variable snapshot is captured at most once per hour (per instance or pod). If variable values are not visible on a trace, try these options:

  • Confirm Exception Replay is enabled on the source service and environment.
  • Click View Similar Errors.
  • Expand the time range selection to find error instances with captured variable values.
  • Use the search query @error.debug_info_captured:true in Error Tracking Explorer.
  • Check Log Indexes to confirm logs with the tag source:dd_debugger have appropriate retention and aren’t affected by Exclusion Filters in preceding indexes.

Further Reading

PREVIEWING: sabrenner/llmobs-proxy-service-quickstart-guide