Getting Started with Monitors

Overview

With Datadog alerting, you have the ability to create monitors that actively check metrics, integration availability, network endpoints, and more. Use monitors to draw attention to the systems that require observation, inspection, and intervention.

This page is an introduction to monitors and outlines instructions for setting up a metric monitor. A metric monitor provides alerts and notifications if a specific metric is above or below a certain threshold. For example, a metric monitor can alert you when disk space is low.

This guide covers:

  • Monitor creation and configuration
  • Setting up monitor alerts
  • Customizing notification messages
  • Monitor permissions

Prerequisites

Before getting started, you need a Datadog account linked to a host with the Datadog Agent installed. To learn more about the Agent, see the Getting started with the Agent guide, or navigate to Integrations > Agent to view installation instructions.

To verify that the Datadog Agent is running, check that your Infrastructure List in Datadog is populated.

Create a monitor

To create a monitor, navigate to Monitors > New Monitor and select Metric.

Configure

The main components of monitor configuration are:

  • Choose the detection method: How are you measuring what will be alerted on? Are you concerned about a metric value crossing a threshold, a change in a value crossing a threshold, an anomalous value, or something else?
  • Define the metric: What value are you monitoring to alert? The disk space in your system? The number of errors encountered for logins?
  • Set the alert conditions: When does an engineer need to be woken up?
  • Configure notifications and automations: What information needs to be in the alert?
  • Define permissions and audit notifications: Who has access to these alerts, and who should be notified if the alert is modified?

Choose the detection method

When you create a metric monitor, Threshold Alert is automatically selected as the detection method. A threshold alert compares metric values against user-defined thresholds. The goal for this monitor is to alert on a static threshold, so no change is necessary.

Define the metric

To get an alert on low disk space, use the system.disk.in_use metric from the Disk integration and average the metric over host and device:

Define the metric for system.disk.in_use avg by host and device

Set alert conditions

According to the Disk integration documentation, system.disk.in_use is the amount of disk space in use as a fraction of the total. So, when this metric is reporting a value of 0.7, the device is 70% full.

To alert on low disk space, the monitor should trigger when the metric is above the threshold. The threshold values are based on your preference. For this metric, values between 0 and 1 are appropriate:

Set the following thresholds:

Alert threshold: > 0.9
Warning threshold: > 0.8

For this example, leave the other settings in this section on the defaults. For more details, see the Metric Monitors documentation.

Set the alert and warning thresholds for the monitor to trigger alerts

Notifications and automations

When this monitor is triggered to alert, a notification message is sent. In this message, you can include conditional values, instructions for resolution, or a summary of what the alert is. At a minimum, a notification must have a title and message.

Title

The title must be unique for each monitor. Since this is a multi alert monitor, names are available for each group element (host and device) with message template variables:

Disk space is low on {{device.name}} / {{host.name}}

Message

Use the message to tell your team how to resolve the issue, for example:

Steps to free up disk space:
1. Remove unused packages
2. Clear APT cache
3. Uninstall unnecessary applications
4. Remove duplicate files

To add conditional messages based on alert vs. warning thresholds, see the available Notification Variables you can include in your message.

Notify your services and your team members

Send notifications to your team through email, Slack, PagerDuty, and more. You can search for team members and connected accounts with the dropdown box.

Add a monitor message and automations to your alert notification

To add a workflow from Workflow Automation or a case from Case Management to the alert notification, click Add Workflow or Add Case. You can also tag Datadog Team members using the @team handle.

Leave the other sections as-is. For more information on what each configuration option does, see the Monitor configuration documentation.

Permissions

Click Edit Access to restrict the editing of your monitor to its creator, teams, users, groups, or to specific roles in your organization. Optionally, select Notify to be alerted when the monitor is modified.

Set access permissions for a monitor and options for audit notifications

For more information, see Granular Access Control.

View Monitors and Triage Alerts on Mobile

You can view Monitor Saved Views from your mobile home screen or view and mute monitors by downloading the Datadog Mobile App, available on the Apple App Store and Google Play Store. This helps with triaging when you are away from your laptop or desktop.

Incidents on Mobile App

Further Reading

PREVIEWING: mcretzman/DOCS-9337-add-cloud-info-byoti