Incident Management Analytics
Overview
Incident Management Analytics is a queryable data source for aggregated incident statistics. You can query these analytics in a variety of graph widgets in both Dashboards and Notebooks to analyze the history of your incident response over time. To give you a starting point, Datadog provides an Incident Management Overview Dashboard template and Notebook template that you can clone and customize as necessary.
The following widgets support Incident Management Analytics:
- Timeseries
- Top List
- Query Value
Measures
Datadog provides the following aggregated measures out of the box for forming analytics queries:
- Count (*)
- Customer Impact Duration
- Status Active Duration (amount of time the incident was in
Active
status) - Status Stable Duration (amount of time the incident was in
Stable
status) - Time to Repair (customer impact end timestamp - incident creation timestamp)
- Time to Resolve (resolved timestamp - created timestamp)
In addition to these defaults, you can create new measures by adding custom Number property fields in your Incident Settings.
Graph configuration
To configure your graph using Incident Management Analytics data, follow these steps:
- Select your visualization.
- Select
Incidents
from the data source dropdown menu. - Select a measure from the yellow dropdown menu.
- Default Statistic: Counts the number of incidents.
- Select an aggregation for the measure.
- (Optional) Select a rollup for the measure.
- (Optional) Use the search bar to filter the statistic down to a specific subset of incidents.
- (Optional) Select a facet in the pink dropdown menu to break the measure up by group and select a limited number of groups to display.
- Title the graph.
- Save your widget.
Example: Weekly Outage Customer Impact Duration by Service
- Widget: Timeseries Line Graph
- Datasource:
Incidents
- Measure:
Customer Impact Duration
- Aggregation:
avg
- Rollup:
1w
- Filter:
severity:("SEV-1" OR "SEV-2")
- Group:
Services
, limit to top 5