- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Data Jobs Monitoring gives visibility into the performance and reliability of Apache Spark applications on Amazon EMR.
Amazon EMR Release 6.0.1 or later is required.
Follow these steps to enable Data Jobs Monitoring for Amazon EMR.
dd_api_key
.datadog/dd_api_key
. Then, click Next.EMR EC2 instance profile is a IAM role assigned to every EC2 instance in an Amazon EMR cluster when the instance launches. Follow the Amazon guide to prepare this role based on your application’s need to interact with other AWS services. The following additional permissions may be required for Data Jobs Monitoring.
GetSecretValue
. This is a Read action.EMR_EC2_DefaultRole
.{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticmapreduce:ListBootstrapActions",
"elasticmapreduce:ListInstanceFleets",
"elasticmapreduce:DescribeCluster",
"elasticmapreduce:ListInstanceGroups"
],
"Resource": [
"*"
]
}
]
}
Take note of the name of the IAM role you plan to use as the instance profile for your EMR cluster.
When you create a new EMR cluster in the Amazon EMR console, add a bootstrap action on the Create Cluster page:
Save the following script to an S3 bucket that your EMR cluster can read. Take note of the path to this script.
#!/bin/bash
# Set required parameter DD_SITE
DD_SITE=
# Set required parameter DD_API_KEY with Datadog API key.
# The commands below assumes the API key is stored in AWS Secrets Manager, with the secret name as datadog/dd_api_key and the key as dd_api_key.
# IMPORTANT: Modify if you choose to manage and retrieve your secret differently.
SECRET_NAME=datadog/dd_api_key
DD_API_KEY=$(aws secretsmanager get-secret-value --secret-id $SECRET_NAME | jq -r .SecretString | jq -r '.["dd_api_key"]')
# Optional parameters
# Uncomment the following line to allow adding init script logs when reporting a failure back to Datadog. A failure is reported when the init script fails to start the Datadog Agent successfully.
# export DD_DJM_ADD_LOGS_TO_FAILURE_REPORT=true
# Download and run the latest init script
DD_SITE=$DD_SITE DD_API_KEY=$DD_API_KEY bash -c "$(curl -L https://dd-data-jobs-monitoring-setup.s3.amazonaws.com/scripts/emr/emr_init_latest.sh)" || true
The script above sets the required parameters, and downloads and runs the latest init script for Data Jobs Monitoring in EMR. If you want to pin your script to a specific version, you can replace the file name in the URL with emr_init_1.4.0.sh
to use the last stable version.
On the Create Cluster page, find the Bootstrap actions section. Click Add to bring up the Add bootstrap action dialog.
datadog_agent
.On the Create Cluster page, find the Identity and Access Management (IAM) roles section. For instance profile dropdown, select the IAM role you have granted permissions in Grant permissions to EMR EC2 instance profile.
When your cluster is created, this bootstrap action installs the Datadog Agent and downloads the Java tracer on each node of the cluster.
Tagging enables you to better filter, aggregate, and compare your telemetry in Datadog. You can configure tags by passing -Ddd.service
, -Ddd.env
, -Ddd.version
, and -Ddd.tags
options to your Spark driver and executor extraJavaOptions
properties.
In Datadog, each job’s name corresponds to the value you set for -Ddd.service
.
spark-submit \
--conf spark.driver.extraJavaOptions="-Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>" \
--conf spark.executor.extraJavaOptions="-Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>" \
application.jar
In Datadog, view the Data Jobs Monitoring page to see a list of all your data processing jobs.
런타임의 Spark 스팬(span)에서 태그를 설정할 수 있습니다. 본 태그는 태그가 추가된 후 실행되는 스팬(span)에만 적용됩니다.
// 다음 모든 Spark 컴퓨팅에 태그 추가
sparkContext.setLocalProperty("spark.datadog.tags.key", "value")
spark.read.parquet(...)
런타임 태그를 제거하려면 다음을 수행합니다.
// 다음 모든 Spark 컴퓨팅에서 태그 제거
sparkContext.setLocalProperty("spark.datadog.tags.key", null)
추가 유용한 문서, 링크 및 기사: