- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Infrastructure monitoring provides visibility into your entire IT environment, including cloud-hosted and on-prem servers, through many integrations. Use the Host monitor to stay informed on which hosts are or are not submitting data to ensure continuous visibility.
Every Datadog Agent reports a service check called datadog.agent.up
with the status OK
. You can monitor this check across one or more hosts by using a host monitor.
datadog.agent.up
service check. You can use the metric datadog.agent.running
to monitor the uptime of an AIX Agent. The metric emits a value of 1
if the Agent is reporting to Datadog.To create a host monitor in Datadog, use the main navigation: Monitors –> New Monitor –> Host.
Select the hosts to monitor by choosing host names, tags, or choose All Monitored Hosts
. If you need to exclude certain hosts, use the second field to list names or tags.
AND
logic. All listed names and tags must be present on a host for it to be included.OR
logic. Any host with a listed name or tag is excluded.Monitor | Include | Exclude |
---|---|---|
Include all hosts with the tag env:prod | env:prod | leave empty |
Include all hosts except hosts with the tag env:test | All Monitored Hosts | env:test |
In this section, choose between a Check Alert or Cluster Alert:
A check alert tracks if a host stops reporting for a given amount of time. Too much time following a check run can be a sign of problems with data submission from the host.
Enter the number of minutes to check for missing data. The default value is 2 minutes.
If datadog.agent.up
stops reporting an OK
status for more than the minutes specified, an alert is triggered.
A cluster alert tracks if some percentage of hosts have stopped reporting for a given amount of time.
To set up a cluster alert:
Ungrouped
calculates the status percentage across all included hosts. Grouped
calculates the status percentage on a per group basis.If datadog.agent.up
stops reporting an OK
status for more than the minutes specified and the percentage threshold is reached, an alert is triggered.
For detailed instructions on the advanced alert options (auto resolve, new group delay, etc.), see the Monitor Configuration page.
For detailed instructions on the Configure notifications and automations section, see the Notifications page.