- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
No matter where you declare an incident, it’s important to describe it as thoroughly as possible to share the information with other people involved in your organization’s incident management process. The incident details should give information on:
When you declare an incident, an incident modal comes up. This modal has several core elements:
Incident elements | Description |
---|---|
Title | (Required) Give your incident a descriptive title. |
Severity Level | (Required) Denotes the severity of your incident, from SEV-1 (most severe) to SEV-5 (least severe). If your incident is under initial investigation, and you do not know the severity yet, select UNKNOWN. Note: You can customize the description of each severity level to fit the requirements of your organization. |
Incident Commander | (Required) This person is assigned as the leader of the incident investigation. |
Attributes (Teams) | Assign the appropriate group of users to an incident using Datadog Teams. Members of the assigned team are automatically invited to the Slack channels. |
An incident’s status and details can be updated on the incident’s Overview tab. Within an incident, fill out the Overview tab with relevant details—including incident description, customer impact, affected services, incident responders, root cause, detection method, and severity—to give your teams all the information they need to investigate and resolve an incident.
Update the impact section to specify customer impact, the start and end times of the impact, and whether the incident is still active. This section also requires a description of the scope of impact to be completed.
The default statuses are Active, Stable, and Resolved. You can add the Completed status and customize the description of each status level in the Incident Settings page.
As the status of an incident changes, Datadog tracks time-to-resolution as follows:
Status Transition | Resolved Timestamp |
---|---|
Active to Resolved , Active to Completed | Current time |
Active to Resolved to Completed , Active to Completed to Resolved | Unchanged |
Active to Completed to Active to Resolved | Overridden on last transition |
Form your response team by adding other users and assigning them roles to carry out in the process of resolving an incident. There are two default responder types provided by Datadog.
Responders are notified through the email associated with their Datadog account. Anyone is able to change the role of a responder, but to remove an individual from an incident’s Response Team you must have the general Responder
role assigned and have no activity in the incident. If there is already an Incident Commander
assigned to an incident, assigning another individual as the Incident Commander
transfers that role over to them. The previous Incident Commander
is reassigned the general Responder
role. A similar reassignment happens whenever you reassign one of your custom one person roles.
The Response Team tab saves the date and time when an individual was originally added to the response team of an incident, as well as the date and time when they last contributed something to the Incident Timeline.
You can create custom responder roles in the Incident Settings for Responder Types. This allows you to create new responder types with custom names and descriptions. It also allows you to choose if a responder type should be a one person role or a multi person role.
Attributes are the metadata and context that you can define for each incident. These fields are key:value metric tags. Add these field keys on the Incident Settings Property Fields page. The values you add are then available when you are assessing the impact of an incident on the Overview tab. The following fields are available for assessment in all incidents:
Configure incident notifications to share incident updates with all stakeholders and keep all involved members aware of the current investigation. For more information, see the Notification page.