- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Failed deployment events, currently interpreted through failure events, are used to compute change failure rate and time to restore.
PagerDuty is an incident management platform that equips IT teams with immediate incident visibility, enabling proactive and effective responses to maintain operational stability and resilience.
To integrate your PagerDuty account with DORA Metrics:
Enable PagerDuty as a failure data source in DORA settings.
Navigate to Integrations > Developer Tools in PagerDuty and click Generic Webhooks (v3).
Click + New Webhook and enter the following details:
Variable | Description |
---|---|
Webhook URL | Add https://webhook-intake./api/v2/webhook/ . |
Scope Type | Select Account to send incidents for all PagerDuty services in your account. Alternatively, you can send incidents for specific services or teams by selecting a different scope type. |
Description | A description helps distinguish the webhook. Add something like Datadog DORA Metrics integration . |
Event Subscription | Select the following events: - incident.acknowledged - incident.annotated - incident.custom_field_values.updated - incident.delegated - incident.escalated - incident.priority_updated - incident.reassigned - incident.reopened - incident.resolved - incident.triggered - incident.unacknowledged |
Custom Headers | Click Add custom header, enter DD-API-KEY as the name, and input your Datadog API key as the value.Optionally, you can add an environment to all of the PagerDuty incidents sent from the webhook by creating an additional custom header with the name dd_env and the desired environment as the value. |
To save the webhook, click Add Webhook.
The severity of the failure in the DORA Metrics product is based on the incident priority in PagerDuty.
Note: Upon webhook creation, a new secret is created and used to sign all the webhook payloads. That secret is not needed for the integration to work, as the authentication is performed using the API key instead.
When an incident event is received for a specific PagerDuty service, Datadog attempts to retrieve the related Datadog service and team from any triggering Datadog monitors and from the Software Catalog.
The matching algorithm works in the following steps:
If the PagerDuty incident event was triggered from a Datadog monitor:
env
, service
, and team
from the alerted group.env
, service
, or team
:env
: If the monitor has a single env
tag, the incident metrics and events are emitted with the environment.service
: If the monitor has one or more service
tags, the incident metrics and events are emitted with the provided services.team
: If the monitor has a single team
tag, the incident metrics and events are emitted with the team.If the service URL of the incident matches the PagerDuty service URL for any services in the Software Catalog:
For more information about setting the PagerDuty service URL for a Datadog service, see Use Integrations with Software Catalog.
If the PagerDuty service name of the incident matches a service name in the Software Catalog, the incident metrics and events are emitted with the service and team.
If the PagerDuty team name of the incident matches a team name in the Software Catalog, the incident metrics and events are emitted with the team.
If the PagerDuty service name of the incident matches a team name in the Software Catalog, the incident metrics and events are emitted with the team.
If there have been no matches up to this point, the incident metrics and events are emitted with the PagerDuty service and PagerDuty team provided in the incident.
To send your own failure events, use the DORA Metrics API. Failure events are used in order to calculate change failure rate and time to restore.
Include the finished_at
attribute in a failure event to mark that the failure is resolved. You can send events at the start of the failure and after it has been resolved. Failure events are matched by the env
, service
and started_at
attributes.
services
or team
(at least one must be present)started_at
You can optionally add the following attributes to the failure events:
finished_at
for resolved failures. Required for calculating time to restoreid
for identifying failures. This attribute is user-generated; when not provided, the endpoint returns a Datadog-generated UUID.name
to describe the failure.severity
env
to filter your DORA metrics by environment on the DORA Metrics page.repository_url
commit_sha
version
See the DORA Metrics API reference documentation for the full spec and additional code samples.
For the following configuration, replace <DD_SITE>
with :
curl -X POST "https://api.<DD_SITE>/api/v2/dora/incident" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: ${DD_API_KEY}" \
-d @- << EOF
{
"data": {
"attributes": {
"services": ["shopist"],
"team": "shopist-devs",
"started_at": 1693491974000000000,
"finished_at": 1693491984000000000,
"git": {
"commit_sha": "66adc9350f2cc9b250b69abddab733dd55e1a588",
"repository_url": "https://github.com/organization/example-repository"
},
"env": "prod",
"name": "Web server is down failing all requests",
"severity": "High",
"version": "v1.12.07"
}
}
}
EOF
Change failure rate requires both deployment data and failure data.
Change failure rate is calculated as the percentage of failure events out of the total number of deployments. Datadog divides Count of Failures
over Count of Deployments
for the same services and/or teams associated to both a failure and a deployment event.
Time to restore is calculated as the duration distribution for resolved failure events.
DORA Metrics generates the Time to Restore
metric by recording the start and end times of each failure event. It calculates the time to restore as the median of these Time to Restore
data points over a selected time frame.
추가 유용한 문서, 링크 및 기사: