Quantization of APM Data

Overview

During ingestion, Datadog applies quantization to APM data such as random globally unique IDs (GUIDs), numeric IDs, and query parameter values in span or resource names. The resulting normalization cuts down on name pollution that results from these random patterns by grouping those spans and resources together because they are, for analysis purposes, the same.

Certain patterns in resource or span names are replaced with the following static strings:

  • GUIDs: {guid}
  • Numeric IDs (6+ digit numbers surrounded by non-alphanumeric characters or found at the end of a string): {num}
  • Query parameter values: {val}

These replacements affect:

  • trace metric names,
  • the resource name tag on those metrics, and
  • the resource and span names for all ingested spans.

Quantization examples

For example, if a span name is find_user_2461685a_80c9_4d9e_85e9_a3b0e9e3ea84, it is renamed to find_user_{guid} and the resulting trace metrics are:

  • trace.find_user_guid
  • trace.find_user_guid.hits
  • trace.find_user_guid.errors
  • trace.find_user_guid.duration
  • trace.find_user_guid.apdex (if Apdex is configured for the service)

To search for these spans in trace search, the query is operation_name:"find_user_{guid}".

If a resource name is SELECT ? FROM TABLE temp_128390123, it is renamed to SELECT ? FROM TABLE temp_{num} and its metric-normalized tag is resource_name:select_from_table_temp_num.

To search for these spans in trace search, the query is resource_name:"SELECT ? FROM TABLE temp_{num}".

Changing instrumentation to avoid default quantization

Note: Any change to span and resource names upstream in the instrumentation or the Agent produces new metrics and tags. If you use queries on quantized data, those queries must be updated to work with the new names.

In-code instrumentation

If your application runs in an agentless setup or if you prefer to make instrumentation changes more directly in your code, see the tracer documentation of your application’s runtime for information on how to create custom configuration for span names and resource names.

Agent configuration

You can use the replace_tags YAML configuration option to set up your own replacement strings through Go-compliant regex:

apm_config:
  replace_tags:
    # Replace tailing numeric IDs in span names with "x":
    - name: "span.name"
      pattern: "get_id_[0-9]+"
      repl: "get_id_x"
    # Replace numeric IDs in resource paths:
    - name: "resource.name"
      pattern: "/users/[0-9]+/"
      repl: "/users/{user_id}/"

Alternatively, you can use the DD_APM_REPLACE_TAGS environment variable with a JSON string as its value:

export DD_APM_REPLACE_TAGS = '[{"name": "span.name", "pattern": "get_id_[0-9]+", "repl": "get_id_x"}, {...}, …]'

Further Reading

PREVIEWING: esther/docs-9518-update-example-control-sensitive-log-data