- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Explore and register for Foundation Enablement sessions. Learn how Datadog Incident Management enables DevOps teams and SREs to more effectively manage their incident response workflows from start to finish, saving time and reducing frustration when it matters most.
Any event that may lead to a disruption in your organization’s services can be described as an incident, and it is often necessary to have a set framework for handling these events. Datadog’s Incident Management feature provides a system through which your organization can effectively identify and mitigate incidents.
Incidents live in Datadog alongside the metrics, traces, and logs you are collecting. You can view and filter incidents that are relevant to you.
Incident Management requires no installation. Get started by taking a Learning Center course, reading our guided walkthrough, or declaring an incident.
Learn more about Incident Management:
To view your incidents, go to the Incidents page to see a feed of all ongoing incidents.
You can also view your Incidents list from your mobile device home screen and manage/create incidents by downloading the Datadog Mobile App, available on the Apple App Store and Google Play Store.
No matter where you create an incident, it’s important to describe it as thoroughly as possible to share the information with other people involved in your company’s incident management process.
When you create an incident, an incident modal comes up. This modal has several core elements:
Incident elements | Description |
---|---|
Title | (Required) Give your incident a descriptive title. |
Severity Level | (Required) Denotes the severity of your incident, from SEV-1 (most severe) to SEV-5 (least severe). If your incident is under initial investigation, and you do not know the severity yet, select UNKNOWN. Note: You can customize the description of each severity level to fit the requirements of your organization. |
Incident Commander | This person is assigned as the leader of the incident investigation. |
Attributes (Teams) | Assign the appropriate group of users to an incident using Datadog Teams. Members of the assigned team are automatically invited to the Slack channels. |
Notifications | Specify a user, Slack channel or external email to send notifications of this incident to. |
Notes & Links | You can customize the description of each severity level to fit the requirements of your organization. Include links to graphs, monitors, or security signals for additional awareness. |
An incident’s status can be updated directly on the incident’s overview page, or from Slack within the dedicated incident channel. To update an incident from its Slack channel, use this slash command to open the update modal: /datadog incident update
Update the impact section to specify customer impact, the start and end times of the impact, and whether the incident is still active. This section also requires a description of the scope of impact to be completed.
In the incident header, you can see the incident’s state, severity, timestamp, impact, and duration, as well as who has responded to the incident. You can also notify responders of updates. There are quick links to chat channels (if not using the Datadog Slack App, video conferencing, and attached postmortem (if one has been added).
Timeline data is automatically categorized, so you can use the facets to filter through timeline content. This is particularly useful for long incidents with longer investigations. This makes it easier for ICs and responders to filter through for who is involved, what progress has been made, and what’s already investigated. As the author of the timeline notes, you can edit the timestamps and message notes as they are created. You can also flag timeline calls to highlight them to other people monitoring the incident.
The default includes the statuses Active, Stable, and Resolved. Completed can be enabled or disabled. You can customize the description of each status level to fit the requirements of your organization.
As the status of an incident changes, Datadog tracks time-to-resolution as follows:
Status Transition | Resolved Timestamp |
---|---|
Active to Resolved , Active to Completed | Current time |
Active to Resolved to Completed , Active to Completed to Resolved | Unchanged |
Active to Completed to Active to Resolved | Overridden on last transition |
Assessment fields are the metadata and context that you can define per incident. These fields are key:value metric tags. These field keys are added in settings, and the values are then available when you are assessing the impact of an incident on the overview page. For example, you can add an Application field. The following fields are available for assessment in all incidents:
Incident Management collects the following analytic measures:
For more information about Incident Management graphs, see Incident Management Analytics.
In addition to integrating with Slack, Incident Management also integrates with:
Work through an example workflow in the Getting Started with Incident Management guide.
추가 유용한 문서, 링크 및 기사: