- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Ensuring system health and performance requires timely and actionable notifications. However, effective alerts depend not only on timing, but also on the number of alerts generated. Alert aggregation is critical in managing this balance.
For example, if you manage an an e-commerce site and issues arise, do you want one summarized alert or multiple specific alerts detailing each aspect of the failure?
The answer depends on your system’s architecture, the nature of the issue, and your team’s workflow. Understanding alert aggregations can help ensure teams are informed of issues without excess notifications.
This guide explores various alert aggregation capabilities and strategies for different scenarios.
By default, Datadog sends an alert for each monitored group. However, you can choose to receive a single notification, regardless of how many monitored groups breach the threshold.
Consider this example, you have a monitor query grouped by multiple attributes, in this case the query is grouped by topic
and partition
.
A simple alert monitor aggregates your alerts into a single unique alert.
Using the example, no matter which topic
or partition
breaches the threshold, the monitor sends a single alert. All notifications are aggregated into one alert.
For more information, see Configure Monitors - Simple alert.
A multi alert monitor sends a notification for each unique combination of groups.
topic
and partition
breaches the threshold.For more information, see Configure Monitors - Multi alert.
You’re monitoring a Kafka-based logging system grouped by error-logs
and user-events
. If any partition gets a message lag of more than 500, you want to know about it, but you don’t need multiple alerts if multiple partitions are lagging.
A simple alert is useful for teams that don’t want excessive notifications but still need to act when issues arise.
However, if multiple partitions are lagging, you might not see every single affected partition in a single notification.
Multi alerts are great for when a service is owned by multiple teams (each team is responsible for a dedicated component). Depending on which component is causing an issue, a different team should be notified.
You’re running an e-commerce order processing system, and messages are sent to the “order-events” topic.
If multiple partitions lag (for example, partition 1 and partition 3), you need separate alerts because different partitions might correspond to different types of orders (such as domestic vs. international).
This level of detail helps engineers respond quickly and with precision.
If you are managing your monitors with the API, use the variable notify_by
to make your monitor a simple alert or a multi alert.
Type of Alert | Configuration Example |
---|---|
Simple Alert | "notify_by": [*] |
Multi Alert | "notify_by": [<group>] , for example, "notify_by": ["topic"] |
For more information, see the API documentation.