Alert on anomalous p99 latency of a database service

3 minutes to complete

Datadog allows you to set monitors to keep track of the health of your services with APM instead of constantly monitoring it yourself. In this example, we’ll use an anomaly detection monitor. Anomaly detection is an algorithmic feature that allows you to identify when a metric is behaving differently than it has in the past, taking into account trends, seasonal day-of-week, and time-of-day patterns. It is well-suited for metrics with strong trends and recurring patterns that are hard or impossible to monitor with threshold-based alerting.

  1. Open the New Monitor Page and choose APM.

  2. Choose your environment under Primary Tags and Choose the database to monitor under Service.

    Under Resource, you can choose to monitor specific queries run in the database, but in this example, we’ll look at overall performance so leave it as *.

    Once you choose a service, the next step becomes available for you to set, and a chart appears at the top of the page showing the performance of the metric that the new monitor tracks.

    Monitor view with ongoing alert
  3. Choose an Anomaly Alert and under the For option select p99 latency.

    Once you choose Anomaly Alert the chart also shows you the baseline expected behavior for the metric chosen - in our case, p99 latency.

  4. Set the Alert when field value to 100%.

    This means that all of the events for the selected duration have to be anomalous for the alert to trigger. This is a best practice for starting with Anomaly Detection. Over time, you’ll find the right values that fit your situation. You can find out more about Anomaly Detection Monitors in the FAQ.

  5. Change the alert notification.

    In this example, you can either leave the notification content with the default text or choose team members to tag in the alert.

    Monitor view with ongoing alert

    You can read more about the markup for notification text and what values and conditions you can set there in the notifications overview.

  6. Make sure your username appears in the Configure notifications and automations notification field and add any additional team members that should be notified in case of a database latency anomaly.

    Note: To add another user, type @ at the start. Click Save.

    Your alert is now set, you can tweak any of the parameters from this screen and follow the metric performance.

  7. Switch from the Edit tab to the Status tab.

    Monitor view with ongoing alert

    Here you can see the current status of your monitor, mute it, or explore deeper into the specifics of a triggered alert.

  8. Navigate back to the Service Catalog and from there find the service you just set the monitor on, click into the Service Page and there click on the Monitor bar under the header.

    Here you should see the new monitor alongside any other monitor set for the service and suggested monitors that are recommended to set.

    Monitor view with ongoing alert

    As you create monitors you’ll find more services, metrics and events to include and more complex conditions to set for these. Each of these monitors is connected to a service and can be accessed from the Service page as well as the Service Map.

    Service Map

    For each service on the map, a green circle means all monitors are quiet, yellow means one or more monitors are sending warnings but none are alerting, red means one or more monitor is alerting and gray means no monitor is set for the service.

Further Reading

PREVIEWING: may/unit-testing