- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Every incident in Datadog has its own Incident Details page where you can manage your incident’s property fields, signals, tasks, documents, responders, and notifications. An Incident Details page is available after you create a new incident. The Incident Details page contains a global header for quick access to key actions, while the remaining body of the page is divided into different sections using tabs to group related incident data together. The first of these sections is the Overview tab.
The global header provides access to the Status and Severity selectors, and links to your Incident Integrations. For more information on how to configure automatic links with every new incident for Slack and Microsoft Teams links, see Incident Settings.
After you’ve moved an incident to the resolved status, an option appears in the header to generate a postmortem Notebook using a postmortem template. Configure your postmortem templates in the Incident Settings page to predefine the structure and content of your postmortems.
Use the Overview tab to specify an incident’s properties and define customer impact.
By default, all incidents have the following properties:
Properties are divided into the following three sections:
In Incident Settings, add additional property fields using the <KEY>:<VALUE>
pairs from your Datadog metric tags, or create custom ones. Assign values to an incident’s properties to enable you to search for a subset of incidents on the Incident Homepage and to form queries when using Incident Management Analytics. You can also reorder your property fields and move them to different headings so the most important properties are in prominent locations.
If your incident is customer-facing, specify the details in the Impact section:
Scope of impact
.In addition to housing your property fields, the Overview tab also provides the following at-a-glance summary modules:
The Incident Timeline is the primary source of information for the work done during an incident. As actions are performed, new cells are added to the timeline in chronological order to capture the changes made, the person who made the change, and the time the changes were made.
Each cell has its own content type that indicates the kind of information the cell contains:
Content type | Description |
---|---|
Responder note | A note manually written by an incident responder. Responder notes have the following sub-types: - Graph: The responder note contains one or more Datadog graphs - Link: The responder note contains a hyperlink - Code: The responder note contains text wrapped in Markdown syntax for code blocks |
Incident update | Any changes made to an incident’s properties (including status and severity) or its impact. |
Integration update | Any changes made through the Incident Management product’s integrations. |
Task | Any changes made to incident tasks in the Remediation section of the Incident Details page. |
Notification sent | An update when a manual notification is sent by an incident responder. |
Add responder notes directly to the timeline using the text box just underneath the tabs for switching between the different sections of the Incident Details page. Customize the timestamp of the responder note at creation time to capture important information that was relevant at an earlier point in time in the chronological order of the timeline. For responder notes you’ve authored, you can edit the content or timestamp, or delete the note entirely. You can also copy a link to a specific cell to share with teammates. Responder notes can be added to the timeline from Slack.
For graph cells specifically, graph definitions are stored using share URLs for graphs if enabled in your Organization Settings. When a graph cell is added to the timeline, it has full interactive hover states as found in Dashboards, Notebooks, and other pages. After 24 hours of being added to the timeline, the graphs are replaced with static images capturing what the graph was displaying. This is to ensure that graphs that are displaying short retention data have backup images captured even after the live data in the graphs has expired.
By default, timeline cells are sorted in oldest first
order, but this can be changed to newest first
using the button at the top of the timeline.
Use the Remediation section to store any documents or resources that are relevant to the remediation process of an incident, as well as to track key tasks for the remediation process.
Documents can be added by pasting the document URL and giving the link a human-readable name for quick access.
Incident tasks are created directly in the Remediation section, as well as through Datadog’s Slack integration.
From the Remediation section, type the description of your task in the creation text box. To assign a task to a Datadog user, type @
in the description text box, or use the Assignees
column after the task has been created. An incident task can have more than one assignee. After a task has been created, it can also be assigned a due date.
As work for different tasks is finished, individual tasks can be marked as completed by clicking the checkbox to the left of the task’s description. If you have a large number of tasks, you can filter them down by searching for key words or by hiding completed tasks from view.
In the Response Team section, you can form your response team by adding other users and assigning them roles to carry out in the process of resolving an incident. The two default responder types provided by Datadog are:
Incident Commander
- The individual responsible for leading the response teamResponder
- An individual that actively contributes to investigating an incident and resolving its underlying issueIf you wish to create custom responder roles, you can do so in the Incident Settings for Responder Types. This allows you to create new responder types with custom names and descriptions. It also allows you to choose if a responder type should be a one person role or a multi person role.
Note: These roles are unrelated to those found in the Role Based Access Control (RBAC) system. RBAC roles control a user’s permissions to access certain features in Datadog. The Responder Types system in Incident Management does not change a user’s permissions in any capacity. It is instead about inviting responders to your incidents and giving them documented roles in your response process for visibility.
If you add an individual as a responder, they are notified through the email associated with their Datadog account. Anyone is able to change the role of a responder, but you can only remove an individual from an incident’s Response Team if they have the general Responder
role assigned and have no activity in the incident. If there is already an Incident Commander
assigned to an incident, assigning another individual as the Incident Commander
transfers that role over to them. The previous Incident Commander
is reassigned the general Responder
role. A similar reassignment happens whenever you reassign one of your custom one person roles.
The Response Team list also saves the date and time when an individual was originally added to the response team of an incident, as well as the date and time when they last contributed something to the Incident Timeline.
All stakeholder notifications for an incident are consolidated in the Notifications section. You can manually create, save as draft, and send notifications directly from this page. Automated notifications sent by Notification Rules for the incident in question are also listed in this section.
To create a manual notification:
{{
.{{incident.created}}
variable to customize your message timezone. This template variable will display the option to set your variable time zone.The Notifications section is separated into lists: Drafts and Sent.
Both lists display:
The Sent list also displays if a notification was manually or automatically sent by a notification rule. If the notification was automated, the rule that triggered the notification is displayed.
Work through an example workflow in the Getting Started with Incident Management guide.
추가 유용한 문서, 링크 및 기사: