Cette page n'est pas encore disponible en français, sa traduction est en cours. Si vous avez des questions ou des retours sur notre projet de traduction actuel, n'hésitez pas à nous contacter.
Data Jobs Monitoring gives visibility into the performance and reliability of Apache Spark applications on Amazon EMR.
EMR EC2 instance profile is a IAM role assigned to every EC2 instance in an Amazon EMR cluster when the instance launches. Follow the Amazon guide to prepare this role based on your application’s need to interact with other AWS services. The following additional permissions may be required for Data Jobs Monitoring.
When you create a new EMR cluster in the Amazon EMR console, add a bootstrap action on the Create Cluster page:
Save the following script to an S3 bucket that your EMR cluster can read. Take note of the path to this script.
#!/bin/bash
# Set required parameter DD_SITEexportDD_SITE=datadoghq.com# Set required parameter DD_API_KEY with Datadog API key.# The commands below assumes the API key is stored in AWS Secrets Manager, with the secret name as datadog/dd_api_key and the key as dd_api_key.# IMPORTANT: Modify if you choose to manage and retrieve your secret differently.SECRET_NAME=datadog/dd_api_key
exportDD_API_KEY=$(aws secretsmanager get-secret-value --secret-id $SECRET_NAME| jq -r .SecretString | jq -r '.["dd_api_key"]')# Optional: uncomment to send spark driver and worker logs to Datadog# export DD_EMR_LOGS_ENABLED=true# Download and run the latest init scriptcurl -L https://install.datadoghq.com/scripts/install-emr.sh > djm-install-script; bash djm-install-script ||true
The script above sets the required parameters, and downloads and runs the latest init script for Data Jobs Monitoring in EMR. If you want to pin your script to a specific version, you can replace the filename in the URL with install-emr-0.10.0.sh to use version 0.10.0, for example. The source code used to generate this script, and the changes between script versions can be found on the Datadog Agent repository.
On the Create Cluster page, find the Bootstrap actions section. Click Add to bring up the Add bootstrap action dialog.
For Name, give your bootstrap action a name. You can use datadog_agent.
For Script location, enter the path to where you stored the init script in S3.
Click Add bootstrap action.
On the Create Cluster page, find the Identity and Access Management (IAM) roles section. For instance profile dropdown, select the IAM role you have granted permissions in Grant permissions to EMR EC2 instance profile.
When your cluster is created, this bootstrap action installs the Datadog Agent and downloads the Java tracer on each node of the cluster.
Tagging enables you to better filter, aggregate, and compare your telemetry in Datadog. You can configure tags by passing -Ddd.service, -Ddd.env, -Ddd.version, and -Ddd.tags options to your Spark driver and executor extraJavaOptions properties.
In Datadog, each job’s name corresponds to the value you set for -Ddd.service.