Data Jobs Monitoring for Spark on Kubernetes

Docs > Data Jobs Monitoring > Data Jobs Monitoring for Spark on Kubernetes

このページは日本語には対応しておりません。随時翻訳に取り組んでいます。翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください。

Data Jobs Monitoring gives visibility into the performance and reliability of Apache Spark applications on Kubernetes.

Setup

Data Jobs Monitoring requires Datadog Agent version 7.55.0 or later, and Java tracer version 1.38.0 or later.

Follow these steps to enable Data Jobs Monitoring for Spark on Kubernetes.

Install the Datadog Agent on your Kubernetes cluster.
Inject Spark instrumentation.

Install the Datadog Agent on your Kubernetes cluster

If you have already installed the Datadog Agent on your Kubernetes cluster, ensure that you have enabled the Datadog Admission Controller. You can then go to the next step, Inject Spark instrumentation.

You can install the Datadog Agent using the Datadog Operator or Helm.

Prerequisites

Kubernetes cluster version v1.20.X+
Helm
The kubectl CLI

Installation

Install the Datadog Operator by running the following commands:

helm repo add datadog https://helm.datadoghq.com
helm install my-datadog-operator datadog/datadog-operator

Create a Kubernetes Secret to store your Datadog API key.
```
kubectl create secret generic datadog-secret --from-literal api-key=<DATADOG_API_KEY> --from-literal app-key=<DATADOG_APP_KEY>
```
- Replace <DATADOG_API_KEY> with your Datadog API key.
- Replace <DATADOG_APP_KEY> with your Datadog app key.

Create a file, datadog-agent.yaml, that contains the following configuration:

kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
  name: datadog
spec:
  features:
    apm:
      enabled: true
      hostPortConfig:
        enabled: true
        hostPort: 8126
    admissionController:
      enabled: true
      mutateUnlabelled: false
  global:
    tags:
      - 'data_workload_monitoring_trial:true'
    site: <DATADOG_SITE>
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
      appSecret:
        secretName: datadog-secret
        keyName: app-key
  override:
    nodeAgent:
      image:
        tag: <DATADOG_AGENT_VERSION>
      env:
        - name: DD_DJM_CONFIG_ENABLED
          value: "true"

Replace <DATADOG_SITE> with your Datadog site. Your site is . (Ensure the correct SITE is selected on the right).

Replace <DATADOG_AGENT_VERSION> with version 7.55.0 or later.

Deploy the Datadog Agent with the above configuration file:
```
kubectl apply -f /path/to/your/datadog-agent.yaml
```

Create a Kubernetes Secret to store your Datadog API key.
```
kubectl create secret generic datadog-secret --from-literal api-key=<DATADOG_API_KEY> --from-literal app-key=<DATADOG_APP_KEY>
```
- Replace <DATADOG_API_KEY> with your Datadog API key.
- Replace <DATADOG_APP_KEY> with your Datadog app key.

Create a file, datadog-values.yaml, that contains the following configuration:

datadog:
  apiKeyExistingSecret: datadog-secret
  appKeyExistingSecret: datadog-secret
  site: <DATADOG_SITE>
  apm:
    portEnabled: true
    port: 8126
  tags:
    - 'data_workload_monitoring_trial:true'
  env:
    - name: DD_DJM_CONFIG_ENABLED
      value: "true"

agents:
  image:
    tag: <DATADOG_AGENT_VERSION>

clusterAgent:
  admissionController:
    enabled: true
    muteUnlabelled: false

Replace <DATADOG_SITE> with your Datadog site. Your site is . (Ensure the correct SITE is selected on the right).

Replace <DATADOG_AGENT_VERSION> with version 7.55.0 or later.

Run the following command:
```
helm install <RELEASE_NAME> \
 -f datadog-values.yaml \
 --set targetSystem=<TARGET_SYSTEM> \
 datadog/datadog
```
- Replace <RELEASE_NAME> with your release name. For example, datadog-agent.
- Replace <TARGET_SYSTEM> with the name of your OS. For example, linux or windows.

Inject Spark instrumentation

When you run your Spark job, use the following configurations:

spark.kubernetes.{driver,executor}.label.admission.datadoghq.com/enabled (Required)

true

spark.kubernetes.{driver,executor}.annotation.admission.datadoghq.com/java-lib.version (Required)

latest

spark.{driver,executor}.extraJavaOptions

-Ddd.data.jobs.enabled=true (Required): true
-Ddd.service (Optional): Your service name. Because this option sets the job name in Datadog, it is recommended that you use a human-readable name.
-Ddd.env (Optional): Your environment, such as prod or dev.
-Ddd.version (Optional): Your version.
-Ddd.tags (Optional): Other tags you wish to add, in the format <KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>.

Example: spark-submit

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master k8s://<CLUSTER_ENDPOINT> \
  --conf spark.kubernetes.container.image=895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest \
  --deploy-mode cluster \
  --conf spark.kubernetes.namespace=<NAMESPACE> \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=<SERVICE_ACCOUNT> \
  --conf spark.kubernetes.authenticate.executor.serviceAccountName=<SERVICE_ACCOUNT> \
  --conf spark.kubernetes.driver.label.admission.datadoghq.com/enabled=true \
  --conf spark.kubernetes.executor.label.admission.datadoghq.com/enabled=true \
  --conf spark.kubernetes.driver.annotation.admission.datadoghq.com/java-lib.version=latest \
  --conf spark.kubernetes.executor.annotation.admission.datadoghq.com/java-lib.version=latest \
  --conf spark.driver.extraJavaOptions="-Ddd.data.jobs.enabled=true -Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>" \
  --conf spark.executor.extraJavaOptions="-Ddd.data.jobs.enabled=true -Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>" \
  local:///usr/lib/spark/examples/jars/spark-examples.jar 20

Example: AWS start-job-run

aws emr-containers start-job-run \
--virtual-cluster-id <EMR_CLUSTER_ID> \
--name myjob \
--execution-role-arn <EXECUTION_ROLE_ARN> \
--release-label emr-6.10.0-latest \
--job-driver '{
  "sparkSubmitJobDriver": {
    "entryPoint": "s3://BUCKET/spark-examples.jar",
    "sparkSubmitParameters": "--class <MAIN_CLASS> --conf spark.kubernetes.driver.label.admission.datadoghq.com/enabled=true --conf spark.kubernetes.executor.label.admission.datadoghq.com/enabled=true --conf spark.kubernetes.driver.annotation.admission.datadoghq.com/java-lib.version=latest --conf spark.kubernetes.executor.annotation.admission.datadoghq.com/java-lib.version=latest --conf spark.driver.extraJavaOptions=\"-Ddd.data.jobs.enabled=true -Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>  --conf spark.executor.extraJavaOptions=\"-Ddd.data.jobs.enabled=true -Ddd.service=<JOB_NAME> -Ddd.env=<ENV> -Ddd.version=<VERSION> -Ddd.tags=<KEY_1>:<VALUE_1>,<KEY_2:VALUE_2>\""
  }
}

Validation

In Datadog, view the Data Jobs Monitoring page to see a list of all your data processing jobs.

Advanced Configuration

Tag spans at runtime

You can set tags on Spark spans at runtime. These tags are applied only to spans that start after the tag is added.

// Add tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", "value")
spark.read.parquet(...)

To remove a runtime tag:

// Remove tag for all next Spark computations
sparkContext.setLocalProperty("spark.datadog.tags.key", null)