Datadog offers several Databricks monitoring capabilities.
Data Jobs Monitoring provides monitoring for your Databricks jobs and clusters. You can detect problematic Databricks jobs and workflows anywhere in your data pipelines, remediate failed and long-running-jobs faster, and optimize cluster resources to reduce costs.
Cloud Cost Management gives you a view to analyze all your Databricks DBU costs alongside the associated cloud spend.
Log Management enables you to aggregate and analyze logs from your Databricks jobs & clusters. You can collect these logs as part of Data Jobs Monitoring.
Infrastructure Monitoring gives you a limited subset of the Data Jobs Monitoring functionality - visibility into the resource utilization of your Databricks clusters and Apache Spark performance metrics.
Model serving metrics provide insights into how your Databricks model serving infrastructure is performing. With these metrics, you can detect endpoints that have high error rate, high latency, are over/under provisioned, and more.
In your Databricks workspace, click on your profile in the top right corner and go to Settings. Select Developer in the left side bar. Next to Access tokens, click Manage.
Click Generate new token, enter “Datadog Integration” in the Comment field, remove the default value in Lifetime (days), and click Generate. Take note of your token.
Important:
Make sure you delete the default value in Lifetime (days) so that the token doesn’t expire and the integration doesn’t break.
Ensure the account generating the token has CAN VIEW access for the Databricks jobs and clusters you want to monitor.
Configure the Spark integration to monitor your Apache Spark Cluster on Databricks and collect system and Spark metrics.
Each script described below can be modified to suits your needs. For instance, you can:
Add specific tags to your instances.
Modify the Spark integration configuration.
You can also define or modify environment variables with the cluster-scoped init script path using the UI, Databricks CLI, or invoking the Clusters API:
Set DD_API_KEY to better identify your clusters.
Set DD_ENV to better identify your clusters.
Set DD_SITE to your site: datadoghq.com. Defaults to datadoghq.com
For security reasons, it's not recommended to define the `DD_API_KEY` environment variable in plain text directly in the UI. Instead, use Databricks secrets.
A global init script runs on every cluster created in your workspace. Global init scripts are useful when you want to enforce organization-wide library configurations or security screens.
Only workspace admins can manage global init scripts.
Global init scripts only run on clusters configured with single user or legacy no-isolation shared access mode. Therefore, Databricks recommends configuring all init scripts as cluster-scoped and managing them across your workspace using cluster policies.
Use the Databricks UI to edit the global init scripts:
Choose one of the following scripts to install the Agent on the driver or on the driver and worker nodes of the cluster.
Modify the script to suit your needs. For example, you can add tags or define a specific configuration for the integration.
Go to the Admin Settings and click the Global Init Scripts tab.
Click on the + Add button.
Name the script, for example Datadog init script and then paste it in the Script field.
Click on the Enabled toggle to enable it.
Click on the Add button.
After these steps, any new cluster uses the script automatically. More information on global init scripts can be found in the Databricks official documentation.
You can define several init scripts and specify their order in the UI.
Cluster-scoped init scripts are init scripts defined in a cluster configuration. Cluster-scoped init scripts apply to both clusters you create and those created to run jobs. Databricks supports configuration and storage of init scripts through:
Workspace Files
Unity Catalog Volumes
Cloud Object Storage
Use the Databricks UI to edit the cluster to run the init script:
Choose one of the following scripts to install the Agent on the driver or on the driver and worker nodes of the cluster.
Modify the script to suit your needs. For example, you can add tags or define a specific configuration for the integration.
Save the script into your workspace with the Workspace menu on the left. If using Unity Catalog Volume, save the script in your Volume with the Catalog menu on the left.
On the cluster configuration page, click the Advanced options toggle.
In the Environment variables, specify the DD_API_KEY environment variable and, optionally, the DD_ENV and the DD_SITE environment variables.
Go to the Init Scripts tab.
In the Destination dropdown, select the Workspace destination type. If using Unity Catalog Volume, in the Destination dropdown, select the Volume destination type.
Specify a path to the init script.
Click on the Add button.
If you stored your datadog_init_script.sh directly in the Shared workspace, you can access the file at the following path: /Shared/datadog_init_script.sh.
If you stored your datadog_init_script.sh directly in a user workspace, you can access the file at the following path: /Users/$EMAIL_ADDRESS/datadog_init_script.sh.
If you stored your datadog_init_script.sh directly in a Unity Catalog Volume, you can access the file at the following path: /Volumes/$VOLUME_PATH/datadog_init_script.sh.