Databricks の Data Jobs Monitoring を有効にする

Docs > Data Jobs Monitoring > Databricks の Data Jobs Monitoring を有効にする

Data Jobs Monitoring は、Apache Spark と Databricks のジョブのパフォーマンスと信頼性を視覚化します。

変数	説明	デフォルト
DD_API_KEY	Datadog API キー。
DD_SITE	Your Datadog site.
DATABRICKS_WORKSPACE	Name of your Databricks Workspace. It should match the name provided in the Datadog-Databricks integration step. Enclose the name in double quotes if it contains whitespace.
DRIVER_LOGS_ENABLED	Collect spark driver logs in Datadog.	false
WORKER_LOGS_ENABLED	Collect spark workers logs in Datadog.	false
DD_DJM_ADD_LOGS_TO_FAILURE_REPORT	Include init script logs for debugging when reporting a failure back to Datadog.	false

変数	説明	デフォルト
DD_API_KEY	Datadog API キー。
DD_SITE	Your Datadog site.
DATABRICKS_WORKSPACE	Name of your Databricks Workspace. It should match the name provided in the Datadog-Databricks integration step. Enclose the name in double quotes if it contains whitespace.
DRIVER_LOGS_ENABLED	Collect spark driver logs in Datadog.	false
WORKER_LOGS_ENABLED	Collect spark workers logs in Datadog.	false
DD_DJM_ADD_LOGS_TO_FAILURE_REPORT	Include init script logs for debugging when reporting a failure back to Datadog.	false

検証

Datadog で Data Jobs Monitoring ページを表示すると、Databricks の全ジョブのリストが表示されます。

高度な構成

ランタイムでのタグスパン

ランタイムで Spark スパンにタグを設定できます。これらのタグは、タグが追加された後に開始するスパンにのみ適用されます。

// 次のすべての Spark 計算のタグを追加します
sparkContext.setLocalProperty("spark.datadog.tags.key", "value")
spark.read.parquet(...)

ランタイムタグを削除するには

// 次のすべての Spark 計算のタグを削除します
sparkContext.setLocalProperty("spark.datadog.tags.key", null)

Aggregate cluster metrics from one-time job runs

This configuration is applicable if you want cluster resource utilization data about your jobs and create a new job and cluster for each run via the one-time run API endpoint (common when using orchestration tools outside of Databricks such as Airflow or Azure Data Factory).

If you are submitting Databricks Jobs via the one-time run API endpoint, each job run will have a unique job ID. This can make it difficult to group and analyze cluster metrics for jobs that use ephemeral clusters. To aggregate cluster utilization from the same job and assess performance across multiple runs, you must set the DD_JOB_NAME variable inside the spark_env_vars of every new_cluster to the same value as your request payload’s run_name.

Here’s an example of a one-time job run request body:

{
   "run_name": "Example Job",
   "idempotency_token": "8f018174-4792-40d5-bcbc-3e6a527352c8",
   "tasks": [
      {
         "task_key": "Example Task",
         "description": "Description of task",
         "depends_on": [],
         "notebook_task": {
            "notebook_path": "/Path/to/example/task/notebook",
            "source": "WORKSPACE"
         },
         "new_cluster": {
            "num_workers": 1,
            "spark_version": "13.3.x-scala2.12",
            "node_type_id": "i3.xlarge",
            "spark_env_vars": {
               "DD_JOB_NAME": "Example Job"
            }
         }
      }
   ]
}