Join the Preview!
Data Pipeline Lineage is available in Preview. If you're interested in this feature, complete the form to request access.
Request AccessDatadog’s Data Pipeline Lineage helps you monitor data flow throughout your pipelines end-to-end, including ingestion, processing, and storage. With expanded visibility into your streaming data pipelines, data jobs, and data warehouses in a unified view, you can detect issues with your data, identify related upstream failures, and troubleshoot faster.
You can visualize lineage of data between components (streaming data, data processing jobs, data warehouses) with upstream and downstream dependencies, monitor throughput, and detect issues such as consumer lag, schema changes, along with the downstream data impacted.
This feature requires both Data Streams Monitoring and Data Jobs Monitoring.
Supported technologies
Type | Technology |
---|
Streaming | - Java producer/consumer services
- Kafka
- RabbitMQ
- SQS
- SNS
- Kinesis
|
Processing | - Apache Spark jobs running on Kubernetes
- Apache Spark jobs running on EMR on EKS
|
Storage | |
Don’t see your tech stack here? Submit a request.
Setup
Set up Data Streams Monitoring on your producer and consumer services. Follow the instructions in the Data Streams Monitoring setup documentation. If you are using Java, ensure that you use the Datadog APM client for Java v1.34.0+.
Set up Data Jobs Monitoring on your Spark workloads. See the instructions for Spark on Kubernetes or Spark on EMR.
Enable Data Streams Monitoring for your Spark jobs. Add -Ddd.data.streams.enabled=true
to your spark-submit
command line.
For example:
spark-submit \
--conf spark.driver.extraJavaOptions="-Ddd.data.jobs.enabled=true -Ddd.data.streams.enabled=true" \
--conf spark.executor.extraJavaOptions="-Ddd.data.jobs.enabled=true -Ddd.data.streams.enabled=true" \
application.jar
For Snowflake services, install APM clients. Install Datadog’s Java or Python APM client for any services that interact with Snowflake. Set the DD_TRACE_REMOVE_INTEGRATION_SERVICE_NAMES_ENABLED
environment variable to true
.
View your pipelines in Datadog
After you set up Data Pipeline Lineage, go the Data Streams Monitoring page in Datadog and select Map to see your visualized pipelines.
Further reading
Additional helpful documentation, links, and articles: