Watchdog Insights for Logs

Overview

Datadog Log Management offers Watchdog Insights to help you resolve incidents faster with contextual insights in the Log Explorer. Watchdog Insights complement your expertise and instincts by surfacing suspect anomalies, outliers, and potential performance bottlenecks impacting a subset of users.

The log explorer showing the Watchdog Insights banner with five log anomalies

The Watchdog Insights banner appears in the Log Explorer and displays insights about the current query:

The Watchdog Insights banner in the collapsed view

To see an overview of all insights, expand the Watchdog Insight banner:

The Watchdog Insights banner showing three error outliers

To access the full Watchdog Insights side panel, click View all:

The Watchdog Insights side panel showing more details about the error outliers

Every insight comes with embedded interactions and a side panel with troubleshooting information. The insight interactions and side panel vary based on the Watchdog Insight type.

Insight Types

Watchdog Insights surfaces anomalies and outliers detected on specific tags, enabling you to investigate the root cause of an issue. Insights are discovered from APM, Continuous Profiler, Log Management, and infrastructure data that include the service tag. The two types of insights specific to Log Management are:

Log Anomaly Detection

Ingested logs are analyzed at the intake level where Watchdog performs aggregations on detected patterns as well as environment, service, source and status tags. These aggregated logs are scanned for anomalous behaviors, such as the following:

  • An emergence of logs with a warning or error status.
  • A sudden increase of logs with a warning or error status.

The logs surface as Insights in the Log Explorer, matching the search context and any restrictions applied to your role.

Click on a specific insight to see the full description of the detected anomaly as well as the list of patterns contributing to it.

Anomalies that Watchdog determines to be particularly severe are also surfaced in the Watchdog alerts feed and can be alerted on by setting up a Watchdog logs monitor. A severe anomaly is defined as:

  • containing error logs
  • lasting at least 10 minutes (to avoid transient errors)
  • having a significant increase (to avoid small increases)

For more information about searching logs in the Log Explorer, see Log Search Syntax and Custom Time Frames.

Error Outliers

Error outliers display fields such as faceted tags or attributes containing characteristics of errors that match the current query. Statistically overrepresented key:value pairs among errors provide hints into the root cause of problems.

Typical examples of error outliers include env:staging, docker_image:acme:3.1, and http.useragent_details.browser.family:curl.

In the banner card view, you can see:

  • The field name.
  • The proportion of errors and overall logs that the field contributes to.
The error outlier card showing a red bar with 73.3% of total errors and a blue bar with 8.31% of total errors

In the side panel card view, you can see the main log pattern of error logs with the field.

Error Outlier card (L)

In the full side panel view, you can see:

  • The timeseries of error logs that contain the field.
  • Tags that are often associated with the error logs.
  • A comprehensive list of log patterns.
Error Outlier side panel

Further Reading

PREVIEWING: rtrieu/product-analytics-ui-changes