Data Security

Overview

Datadog tracing libraries collect data from an instrumented application. That data is sent to Datadog as traces and it may contain sensitive data such as personally identifiable information (PII). If you are ingesting sensitive data as traces into Datadog, remediations can be added at ingestion with Sensitive Data Scanner. You can also configure the Datadog Agent or the tracing library to remediate sensitive data at collection before traces are sent to Datadog.

If the configurations described here do not cover your compliance requirements, reach out to the Datadog support team.

Personal information in trace data

Datadog’s APM tracing libraries collect relevant observability data about your applications. Because these libraries collect hundreds of unique attributes in trace data, this page describes categories of data, with a focus on attributes that may contain personal information about your employees and end-users.

The table below describes the personal data categories collected by the automatic instrumentation provided by the tracing libraries, with some common examples listed.

CategoryDescription
NameThe full name of an internal user (your employee) or end-user.
EmailThe email address of an internal user (your employee) or end-user.
Client IPThe IP address of your end-user associated with an incoming request or the external IP address of an outgoing request.
Database statementsThe literal, sequence of literals, or bind variables used in an executed database statement.
Geographic locationLongitude and latitude coordinates that can be used to identify an individual or household.
URI parametersThe parameter values in the variable part of the URI path or the URI query.
URI userinfoThe userinfo subcomponent of the URI that may contain the user name.
Login IDCan include an account/user ID, name, or email address.

The table below describes the default behavior of each language tracing library with regard to whether a data category is collected and whether it is obfuscated by default.

CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
URI parameters
URI userinfo
Login ID

Note: Database statements are not collected by default and must be enabled.

CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
URI parameters
URI userinfo
Login ID

Note: URI parameters are not collected by default and must be enabled.

CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
URI parameters
URI userinfo
Login ID

Note: Name and email are not collected by default and must be enabled.

CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
URI parameters
URI userinfo
Login ID

Note: Client IP, geographic location, and URI parameters are not collected by default and must be enabled.

CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
URI parameters
URI userinfo
Login ID

Note: Client IPs are not collected by default and must be enabled.

CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
URI parameters
URI userinfo
Login ID

Note: Client IPs are not collected by default and must be enabled. Database statements are obfuscated by the Datadog Agent.

CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
Client URI path
Client URI query string
Server URI path
Server URI query string
HTTP body
HTTP cookies
HTTP headers
Login ID
CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
Client URI path
Client URI query string
Server URI path
Server URI query string
HTTP body
HTTP cookies
HTTP headers
Login ID
CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
Client URI path
Client URI query string
Server URI path
Server URI query string
HTTP body
HTTP cookies
HTTP headers
Login ID
CategoryCollectedObfuscated
Name
Email
Client IP
Database statements
Geographic location
Client URI path
Client URI query string
Server URI path
Server URI query string
HTTP body
HTTP cookies
HTTP headers
Login ID

If you use Datadog Application Security Management (ASM), the tracing libraries collect HTTP request data to help you understand the nature of a security trace. Datadog ASM automatically redacts certain data, and you can configure your own detection rules. Learn more about these defaults and configuration options in the Datadog ASM data privacy documentation.

Agent

Resource names

Datadog spans include a resource name attribute that may contain sensitive data. The Datadog Agent implements obfuscation of resource names for several known cases:

  • SQL numeric literals and bind variables are obfuscated: For example, the following query SELECT data FROM table WHERE key=123 LIMIT 10 is obfuscated to SELECT data FROM table WHERE key = ? LIMIT ? before setting the resource name for the query span.
  • SQL literal strings are identified using standard ANSI SQL quotes: This means strings should be surrounded in single quotes ('). Some SQL variants optionally support double-quotes (") for strings, but most treat double-quoted things as identifiers. The Datadog obfuscator treats these as identifiers rather than strings and does not obfuscate them.
  • Redis queries are quantized by selecting only command tokens: For example, the following query MULTI\nSET k1 v1\nSET k2 v2 is quantized to MULTI SET SET.

Trace obfuscation

The Datadog Agent also obfuscates sensitive trace data that is not within the resource name. You can configure the obfuscation rules using environment variables or the datadog.yaml configuration file.

The following metadata can be obfuscated:

  • MongoDB queries
  • ElasticSearch request bodies
  • Redis commands
  • MemCached commands
  • HTTP URLs
  • Stack traces

Note: Obfuscation can have a performance impact on your system, or could redact important information that is not sensitive. Consider what obfuscation you need for your setup, and customize your configuration appropriately.

Note: You can use automatic scrubbing for multiple types of services at the same time. Configure each in the obfuscation section of your datadog.yaml file.

MongoDB queries within a span of type mongodb are obfuscated by default.

apm_config:
  enabled: true

  ## (...)

  obfuscation:
    mongodb:
      ## Configures obfuscation rules for spans of type "mongodb". Enabled by default.
      enabled: true
      keep_values:
        - document_id
        - template_id
      obfuscate_sql_values:
        - val1

This can also be disabled with the environment variable DD_APM_OBFUSCATION_MONGODB_ENABLED=false.

  • keep_values or environment variable DD_APM_OBFUSCATION_MONGODB_KEEP_VALUES - defines a set of keys to exclude from Datadog Agent trace obfuscation. If not set, all keys are obfuscated.
  • obfuscate_sql_values or environment variable DD_APM_OBFUSCATION_MONGODB_OBFUSCATE_SQL_VALUES - defines a set of keys to include in Datadog Agent trace obfuscation. If not set, all keys are obfuscated.

ElasticSearch request bodies within a span of type elasticsearch are obfuscated by default.

apm_config:
  enabled: true

  ## (...)

  obfuscation:
    elasticsearch:
      ## Configures obfuscation rules for spans of type "elasticsearch". Enabled by default.
      enabled: true
      keep_values:
        - client_id
        - product_id
      obfuscate_sql_values:
        - val1

This can also be disabled with the environment variable DD_APM_OBFUSCATION_ELASTICSEARCH_ENABLED=false.

  • keep_values or environment variable DD_APM_OBFUSCATION_ELASTICSEARCH_KEEP_VALUES - defines a set of keys to exclude from Datadog Agent trace obfuscation. If not set, all keys are obfuscated.
  • obfuscate_sql_values or environment variable DD_APM_OBFUSCATION_ELASTICSEARCH_OBFUSCATE_SQL_VALUES - defines a set of keys to include in Datadog Agent trace obfuscation. If not set, all keys are obfuscated.

Redis commands within a span of type redis are obfuscated by default.

apm_config:
  enabled: true

  ## (...)

  obfuscation:
    ## Configures obfuscation rules for spans of type "redis". Enabled by default.
    redis:
      enabled: true
      remove_all_args: true

This can also be disabled with the environment variable DD_APM_OBFUSCATION_REDIS_ENABLED=false.

  • remove_all_args or environment variable DD_APM_OBFUSCATION_REDIS_REMOVE_ALL_ARGS - replaces all arguments of a redis command with a single “?” if true. Disabled by default.

MemCached commands within a span of type memcached are obfuscated by default.

apm_config:
  enabled: true

  ## (...)

  obfuscation:
    memcached:
      ## Configures obfuscation rules for spans of type "memcached". Enabled by default.
      enabled: true

This can also be disabled with the environment variable DD_APM_OBFUSCATION_MEMCACHED_ENABLED=false.

HTTP URLs within a span of type http or web are not obfuscated by default.

Note: Passwords within the Userinfo of a URL are not collected by Datadog.

apm_config:
  enabled: true

  ## (...)

  obfuscation:
    http:
      ## Enables obfuscation of query strings in URLs. Disabled by default.
      remove_query_string: true
      remove_paths_with_digits: true
  • remove_query_string or environment variable DD_APM_OBFUSCATION_HTTP_REMOVE_QUERY_STRING: If true, obfuscates query strings in URLs (http.url).
  • remove_paths_with_digits or environment variable DD_APM_OBFUSCATION_HTTP_REMOVE_PATHS_WITH_DIGITS: If true, path segments in URLs (http.url) containing only digits are replaced by “?”.

Disabled by default.

Set the remove_stack_traces parameter to true to remove stack traces and replace them with ?.

apm_config:
  enabled: true

  ## (...)

  obfuscation:
    ## Enables removing stack traces to replace them with "?". Disabled by default.
    remove_stack_traces: true # default false

This can also be enabled with the environment variable DD_APM_OBFUSCATION_REMOVE_STACK_TRACES=true.

Replace tags

To scrub sensitive data from your span’s tags, use the replace_tags setting in your datadog.yaml configuration file or the DD_APM_REPLACE_TAGS environment variable. The value of the setting or environment variable is a list of one or more groups of parameters that specify how to replace sensitive data in your tags. These parameters are:

  • name: The key of the tag to replace. To match all tags, use *. To match the resource, use resource.name.
  • pattern: The regexp pattern to match against.
  • repl: The replacement string.

For example:

apm_config:
  replace_tags:
    # Replace all characters starting at the `token/` string in the tag "http.url" with "?"
    - name: "http.url"
      pattern: "token/(.*)"
      repl: "?"
    # Remove trailing "/" character in resource names
    - name: "resource.name"
      pattern: "(.*)\/$"
      repl: "$1"
    # Replace all the occurrences of "foo" in any tag with "bar"
    - name: "*"
      pattern: "foo"
      repl: "bar"
    # Remove all "error.stack" tag's value
    - name: "error.stack"
      pattern: "(?s).*"
    # Replace series of numbers in error messages
    - name: "error.message"
      pattern: "[0-9]{10}"
      repl: "[REDACTED]"
DD_APM_REPLACE_TAGS=[
      {
        "name": "http.url",
        "pattern": "token/(.*)",
        "repl": "?"
      },
      {
        "name": "resource.name",
        "pattern": "(.*)\/$",
        "repl": "$1"
      },
      {
        "name": "*",
        "pattern": "foo",
        "repl": "bar"
      },
      {
        "name": "error.stack",
        "pattern": "(?s).*"
      },
      {
        "name": "error.message",
        "pattern": "[0-9]{10}",
        "repl": "[REDACTED]"
      }
]

Set the DD_APM_REPLACE_TAGS environment variable:

  • For Datadog Operator, in override.nodeAgent.env in your datadog-agent.yaml
  • For Helm, in agents.containers.traceAgent.env in your datadog-values.yaml
  • For manual configuration, in the trace-agent container section of your manifest
- name: DD_APM_REPLACE_TAGS
  value: '[
            {
              "name": "http.url",
              "pattern": "token/(.*)",
              "repl": "?"
            },
            {
              "name": "resource.name",
              "pattern": "(.*)\/$",
              "repl": "$1"
            },
            {
              "name": "*",
              "pattern": "foo",
              "repl": "bar"
            },
            {
              "name": "error.stack",
              "pattern": "(?s).*"
            },
            {
              "name": "error.message",
              "pattern": "[0-9]{10}",
              "repl": "[REDACTED]"
            }
          ]'

Examples

Datadog Operator:

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  override:
    nodeAgent:
      env:
        - name: DD_APM_REPLACE_TAGS
          value: '[
                   {
                     "name": "http.url",
                  # (...)
                  ]'

Helm:

agents:
  containers:
    traceAgent:
      env:
        - name: DD_APM_REPLACE_TAGS
          value: '[
                   {
                     "name": "http.url",
                  # (...)
                  ]'
- DD_APM_REPLACE_TAGS=[{"name":"http.url","pattern":"token/(.*)","repl":"?"},{"name":"resource.name","pattern":"(.*)\/$","repl":"$1"},{"name":"*","pattern":"foo","repl":"bar"},{"name":"error.stack","pattern":"(?s).*"},{"name":"error.message","pattern":"[0-9]{10}","repl":"[REDACTED]"}]

Ignore resources

For an in depth overview of the options to avoid tracing specific resources, see Ignoring Unwanted Resources.

If your services include simulated traffic such as health checks, you may want to exclude these traces from being collected so the metrics for your services match production traffic.

The Agent can be configured to exclude a specific resource from traces sent by the Agent to Datadog. To prevent the submission of specific resources, use the ignore_resources setting in the datadog.yaml file . Then create a list of one or more regular expressions, specifying which resources the Agent filters out based on their resource name.

If you are running in a containerized environment, set DD_APM_IGNORE_RESOURCES on the container with the Datadog Agent instead. See the Docker APM Agent environment variables for details.

###### @param ignore_resources - list of strings - optional

###### A list of regular expressions can be provided to exclude certain traces based on their resource name.

###### All entries must be surrounded by double quotes and separated by commas.

###### ignore_resources: ["(GET|POST) /healthcheck","API::NotesController#index"]

Library

HTTP

Datadog is standardizing span tag semantics across tracing libraries. Information from HTTP requests are added as span tags prefixed with http.. The libraries have the following configuration options to control sensitive data collected in HTTP spans.

Redact query strings

The http.url tag is assigned the full URL value, including the query string. The query string could contain sensitive data, so by default Datadog parses it and redacts suspicious-looking values. This redaction process is configurable. To modify the regular expression used for redaction, set the DD_TRACE_OBFUSCATION_QUERY_STRING_REGEXP environment variable to a valid regex of your choice. Valid regex is platform-specific. When the regex finds a suspicious key-value pair, it replaces it with <redacted>.

If you do not want to collect the query string, set the DD_HTTP_SERVER_TAG_QUERY_STRING environment variable to false. The default value is true.

Collect headers

To collect trace header tags, set the DD_TRACE_HEADER_TAGS environment variable with a map of case-insensitive header keys to tag names. The library applies matching header values as tags on root spans. The setting also accepts entries without a specified tag name, for example:

DD_TRACE_HEADER_TAGS=CASE-insensitive-Header:my-tag-name,User-ID:userId,My-Header-And-Tag-Name

Processing

Some tracing libraries provide an interface for processing spans to manually modify or remove sensitive data collected in traces:

Telemetry collection

Instrumentation telemetry is not available for the site, but is enabled by default. To avoid errors, users should disable this capability by setting DD_INSTRUMENTATION_TELEMETRY_ENABLED=false on their application and DD_APM_TELEMETRY_ENABLED=false on their Agent.

Datadog may gather environmental and diagnostic information about your tracing libraries for processing; this may include information about the host running an application, operating system, programming language and runtime, APM integrations used, and application dependencies. Additionally, Datadog may collect information such as diagnostic logs, crash dumps with obfuscated stack traces, and various system performance metrics.

You can disable this telemetry collection using either of these settings:

apm_config:
  telemetry:
    enabled: false
export DD_APM_TELEMETRY_ENABLED=false

PCI DSS compliance for compliance for APM

PCI compliance for APM is only available for Datadog organizations in the US1 site.

To set up a PCI-compliant Datadog org, follow these steps:

To set up PCI compliant Application Performance Monitoring, you must meet the following requirements:

  • Audit Trail must be enabled and remain enabled for PCI DSS compliance. If you haven’t already enabled Audit Trail, it is automatically enabled once the org is configured as PCI-compliant (after following the steps below).
  • Your Datadog organization is in the US1 site.
  • All spans sent to the PCI endpoints using HTTPS only. If you are using the Agent to send spans, you should enforce HTTPS transport.
  • All your spans endpoints need to be changed to the PCI endpoints for spans.
  • You may request access to the PCI Attestation of Compliance and Customer Responsibility Matrix on Datadog’s Trust Center - note that these documents are only applicable once you have finished all the onboarding steps and have been manually configured to be compliant by Datadog support.

To begin onboarding:

  1. Contact Datadog support or your Customer Success Manager to request to being the PCI onboarding process while ensuring the necessary PCI requirements are met.
  2. After Datadog support or Customer Success confirms that the org is PCI DSS compliant, configure the respective configuration file to send spans to the dedicated PCI compliant endpoint:
  • https://trace-pci.agent.datadoghq.com for Agent and non-Agent traffic
  1. For example, add the following lines to the Agent configuration file:
apm_config:
  apm_dd_url: <https://trace-pci.agent.datadoghq.com>
  1. All spans that are sent to the PCI compliant endpoint(s) automatically have a set of Sensitive Data Scanner PCI rules that are applied to scrub any cardholder data. These dedicated PCI rules must be enalbed for PCI DSS compliance and are included with no additional charge.

To finish onboarding and be moved to compliant:

  1. Inform your Datadog support or your Customer Success Manager that you have moved over all your endpoints to the PCI compliant endpoint(s).
  2. Once confirmed by Datadog, your span configuration and Application Performance Monitoring is considered PCI-compliant.

If you have any questions about how your now PCI-compliant Application Performance Monitoring satisfies the applicable requirements under PCI DSS, contact your account manager. See information on setting up PCI-compliant Log Management.

See PCI DSS Compliance for more information. To enable PCI compliance for logs, see PCI DSS compliance for Log Management.

PCI compliance for APM is not available for the site.

Further Reading

Additional helpful documentation, links, and articles:

PREVIEWING: esther/docs-9518-update-example-control-sensitive-log-data