(LEGACY) Working with Data

Documentos > Observability Pipelines > (LEGACY) Observability Pipelines Documentation > (LEGACY) Working with Data

This page is not yet available in Spanish. We are working on its translation.
If you have any questions or feedback about our current translation project, feel free to reach out to us!

Overview

Observability Pipelines enables you to shape and transform observability data. Similar to Logging without Limits™ pipelines, you can configure pipelines for Observability Pipelines that are composed of a series of transform components. These transforms allow you to parse, structure, and enrich data with built-in type safety.

Remap data

The remap transform can modify events or specify conditions for routing and filtering events. Use Datadog Processing Language (DPL), or Vector Remap Language (VRL), in the remap transform to manipulate arrays and strings, encode and decode values, encrypt and decrypt values, and more. See Datadog Processing Language for more information and the DPL Functions reference for a full list of DPL built-in functions.

Basic `remap` configuration example

To get started, see the following YAML configuration example for a basic remap transform that contains a DPL/VRL program in the source field:

transforms:
  modify:
    type: remap
    inputs:
      - previous_component_id
    source: |2
        del(.user_info)
        .timestamp = now()

In this example, the type field is set to a remap transform. The inputs field defines where it receives events from the previously defined previous_component_id source. The first line in the source field deletes the .user_info field. At scale, dropping fields is particularly useful for reducing the payload of your events and cutting down on spend for your downstream services.

The second line adds the .timestamp field and the value to the event, changing the content of every event that passes through this transform.

Parse data

Parsing provides more advanced use cases of DPL/VRL.

Parsing example

Log event example

The below snippet is an HTTP log event in JSON format:

"{\"status\":200,\"timestamp\":\"2021-03-01T19:19:24.646170Z\",\"message\":\"SUCCESS\",\"username\":\"ub40fan4life\"}"

Configuration example

The following YAML configuration example uses DPL/VRL to modify the log event by:

Parsing the raw string into JSON.
Reformatting the time into a UNIX timestamp.
Removing the username field.
Converting the message to lowercase.

transforms:
  parse_syslog_id:
    type: remap
    inputs:
      - previous_component_id
    source: |2
         . = parse_json!(string!(.message))
         .timestamp = to_unix_timestamp(to_timestamp!(.timestamp))
         del(.username)
         .message = downcase(string!(.message))

Configuration output

The configuration returns the following:

{
  "message": "success",
  "status": 200,
  "timestamp": 1614626364
}

Sample, reduce, filter, and aggregate data

Sampling, reducing, filtering, and aggregating are common transforms to reduce the volume of observability data delivered to downstream services. Observability Pipelines offers a variety of ways to control your data volume:

Sample events based on supplied criteria and at a configurable rate.
Reduce and collapse multiple events into a single event.
Remove unnecessary fields.
Deduplicate events.
Filter events based on a set of conditions.
Aggregate multiple metric events into a single metric event based on a defined interval window.

See Control Log Volume and Size for examples on how to use these transforms.

Route data

Another commonly used transform is route, which allows you to split a stream of events into multiple substreams based on supplied conditions. This is useful when you need to send observability data to different destinations or operate differently on streams of data based on their use case.

Routing to different destinations example

Log example

The below snippet is an example log that you want to route to different destinations based on the value of the level field.

{
  "logs": {
    "kind": "absolute",
    "level": "info,
    "name": "memory_available_bytes",
    "namespace": "host",
    "tags": {}
  }
}

Configuration examples

The following YAML configuration example routes data based on the level value:

transforms:
  splitting_logs_id:
    type: route
    inputs:
      - my-source-or-transform-id
    route:
      debug: .level == "debug"
      info: .level == "info"
      warn: .level == "warn"
      error: .level == "error"

Each row under the route field defines a route identifier, followed by a logical condition representing the filter of the route. The end result of this route can then be referenced as an input by other components with the name <transform_name>.<route_id>.

For example, if you wish to route logs with level field values of warn and error to Datadog, see the following example:

sinks:
  my_sink_id:
    type: datadog_logs
    inputs:
      - splitting_logs_id.warn
      - splitting_logs_id.error
    default_api_key: '${DATADOG_API_KEY_ENV_VAR}'
    compression: gzip

See the route transform reference for more information.

Throttle data

Downstream services can sometimes get overwhelmed when there is a spike in volume, which can lead to data being dropped. Use the throttle transform to safeguard against this scenario and also enforce usage quotas on users. The throttle transform rate limits logs passing through a topology.

Throttle configuration example

The following YAML configuration example is for a throttle transform:

transforms:
  my_transform_id:
    type: throttle
    inputs:
      - my-source-or-transform-id
    exclude: null
    threshold: 100
    window_secs: 1

The threshold field defines the number of events allowed for a given bucket. window_secs defines the time frame in which the configured threshold is applied. In the example configuration, when the component receives more than 100 events in a span of 1 second, any additional events are dropped.