Data Pipeline Lineage

Docs > Data Streams Monitoring > Data Pipeline Lineage

Join the Preview!

Data Pipeline Lineage is available in Preview. If you're interested in this feature, complete the form to request access.

Request Access

Datadog’s Data Pipeline Lineage helps you monitor data flow throughout your pipelines end-to-end, including ingestion, processing, and storage. With expanded visibility into your streaming data pipelines, data jobs, and data warehouses in a unified view, you can detect issues with your data, identify related upstream failures, and troubleshoot faster.

You can visualize lineage of data between components (streaming data, data processing jobs, data warehouses) with upstream and downstream dependencies, monitor throughput, and detect issues such as consumer lag, schema changes, along with the downstream data impacted.

This feature requires both Data Streams Monitoring and Data Jobs Monitoring.

Supported technologies

Type	Technology
Streaming	Java producer/consumer services Kafka RabbitMQ SQS SNS Kinesis
Processing	Apache Spark jobs running on Kubernetes Apache Spark jobs running on EMR on EKS
Storage	S3 Snowflake

Don’t see your tech stack here? Submit a request.

Setup

Set up Data Streams Monitoring on your producer and consumer services. Follow the instructions in the Data Streams Monitoring setup documentation. If you are using Java, ensure that you use the Datadog APM client for Java v1.34.0+.
Set up Data Jobs Monitoring on your Spark workloads. See the instructions for Spark on Kubernetes or Spark on EMR.

Enable Data Streams Monitoring for your Spark jobs. Add -Ddd.data.streams.enabled=true to your spark-submit command line.

For example:

spark-submit \
--conf spark.driver.extraJavaOptions="-Ddd.data.jobs.enabled=true -Ddd.data.streams.enabled=true" \
--conf spark.executor.extraJavaOptions="-Ddd.data.jobs.enabled=true -Ddd.data.streams.enabled=true" \
application.jar

For Snowflake services, install APM clients. Install Datadog’s Java or Python APM client for any services that interact with Snowflake. Set the DD_TRACE_REMOVE_INTEGRATION_SERVICE_NAMES_ENABLED environment variable to true.

View your pipelines in Datadog

In Data Streams Monitoring, the Map view. A pipeline visualization shows data flow from left to right.

After you set up Data Pipeline Lineage, go the Data Streams Monitoring page in Datadog and select Map to see your visualized pipelines.

Data Pipeline Lineage

Join the Preview!

Supported technologies

Setup

View your pipelines in Datadog

Further reading