Investigate Alerts

このページは日本語には対応しておりません。随時翻訳に取り組んでいます。
翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください

Get started with alert investigations

Enable Bits on monitors for automated investigations

There are two main ways to enable Bits for automated investigations:

  • Option 1: Use the Bits-Enabled Monitors list
    1. In Bits AI, go to the Bits-Enabled Monitors page.
    2. In the Monitors tab, select one or more monitors, then click Enable Bits AI.
  • Option 2: Add the Bits AI tag to a monitor
    1. In the Monitor List, select one or more monitors to edit.
      • To edit one monitor, click the monitor to open it, then click Edit.
      • To edit multiple monitors, select them, then click Edit tags.
    2. Add the bitsai:enabled tag to your selected monitors.

You can also add the tag to your desired monitors using the Datadog API or Terraform.

Bits only supports metric, logs, APM, anomaly, forecast, integration, and outlier monitors for investigations.

Configure where investigation results are sent

Bits can send investigation results to several destinations. By default, results appear in two places:

  • Full investigation results are available on the Bits AI Investigations page.
  • A summary of the results is available on the status page for the monitor.

Additionally, if you have already configured @slack, @oncall, or @case notifications in your monitor, Bits automatically writes to those places. If not, you can add them as destinations for investigation results to appear:

  1. Ensure the Datadog Slack app is installed in your workspace.
  2. Go to Bits AI > Settings > Integrations and connect your Slack workspace.
  3. Go to a monitor. Under Configure notifications and automations, add the @slack-{channel-name} handle to send results to Slack.

In the Configure notifications and automations section, add the @oncall-{team} handle.

In the Configure notifications and automations section, add the @case-{project-name} handle.

Manually start an investigation

Alternatively, you can manually invoke Bits on an individual monitor event.

  • Option 1: Monitor Status Page
    • On the monitor status page for an alert event, click Investigate with Bits AI.
  • Option 2: Slack
    • Under a monitor notification in Slack, type, @Datadog Investigate this alert.

For best results, see Optimize monitors for Bits AI SRE.

Optimize monitors for Bits AI SRE

To help Bits produce the most accurate and helpful investigation results, follow these guidelines:

  • Scope the monitor to a service by either filtering the query to a specific service or grouping it by service, where appropriate.
  • Tag the monitor with a service, where appropriate.
  • Add relevant troubleshooting steps to the monitor message to give Bits a starting point. Think of the first page you’d visit in Datadog if this monitor were to fire. Consider including:
    • Plain-language instructions
    • At least one helpful link to:
      • A Datadog dashboard
      • A logs query
      • A trace query
      • A Datadog notebook with key graphs or instructions
Example monitor with optimization steps applied

How Bits AI SRE investigates

Investigations happen in two phases:

  1. Initial context gathering
    1. Bits begins by looking at any troubleshooting steps, Confluence pages, or Datadog links that you’ve added to the monitor’s message, and uses them to make relevant queries.
    2. It also automatically scans your Datadog environment for additional context.
    3. Thirdly, if you’ve interacted with a previous investigation for the same monitor, Bits will recall any memories associated with the monitor.
  2. Root cause hypothesis generation and testing
    • Using the gathered context, Bits performs a more thorough investigation by building multiple root cause hypotheses and testing them in parallel. Today, Bits is able to query:
      • Metrics
      • Traces
      • Logs
      • Dashboards
      • Change events
      • Watchdog insights
      • Monitor alerts
      • Incidents
    • Hypotheses can end in one of three states: validated, invalidated, or inconclusive.

For best results, see Optimize monitors for Bits AI SRE.

Chat with Bits AI SRE about the investigation

On the Bits AI Investigations page, you can chat with Bits to gather additional information about the investigation or the services involved. Click the Suggested replies bubble for examples.

FunctionalityExample promptsData source
Understand the status of its investigationWhat's the latest status of the investigation?Investigation findings
Ask for elaborations of its findingsTell me more about the {issue}.Investigation findings
Look up information about a serviceAre there any ongoing incidents for {example-service}?Software Catalog service definitions
Find recent changes for a serviceWere there any recent changes on {example-service}?Change Tracking events
Find a dashboardGive me the {example-service} dashboard.Dashboards
Query APM request, error, and duration metricsWhat's the current error rate for {example-service}?APM metrics
Search for information in ConfluenceFind me the runbook in Confluence to rollback deployments for {example-service}.Confluence

Help Bits AI SRE learn

Reviewing Bits’ findings not only validates their accuracy, but also helps Bits learn from any mistakes it makes, enabling it to produce faster and more accurate investigations in the future.

During the investigation

You can guide Bits’ learning by:

  • Improving a step: Share a link to a better query Bits should have made.
  • Remembering a step: Tell Bits to remember any helpful queries it generated. This instructs Bits to prioritize running these queries the next time the same monitor fires.

After the investigation

At the end of an investigation, let Bits know if the conclusion it made was correct or not. If it was inaccurate, provide Bits with the correct root cause so that it can learn from the discrepancy.

An investigation conclusion with buttons to rate the conclusion helpful or unhelpful highlighted

Manage memories

Every piece of feedback you give generates a memory. Bits uses these memories to enhance future investigations by recalling relevant patterns, queries, and corrections. You can navigate to Bits-Enabled Monitors to view and delete memories in the Memories column.

PREVIEWING: guacbot/translation-pipeline