Nomad

Supported OS Linux Windows Mac OS

Nomad 대시보드

개요

Nomad 클러스터에서 메트릭을 수집하면 다음을 할 수 있습니다.

  • 클러스터 성능 시각화 및 모니터링
  • 클러스터 상태와 가용성 알림 받기

권장 모니터를 통해 다른 Nomad 이벤트에서 알림을 받을 수 있습니다.

설정

설치

Nomad는 DogStatsD를 통해 메트릭을 Datadog로 전송합니다. Nomad 통합을 활성화하려면 각 클라이언트와 서버 호스트에 Datadog 에이전트를 설치하세요.

구성

Datadog 에이전트가 설치되면 클라이언트와 서버의 Nomad 구성에 텔레메트리 스탠자를 추가할 수 있습니다.

telemetry {
  publish_allocation_metrics = true
  publish_node_metrics       = true
  datadog_address = "localhost:8125"
  disable_hostname = true
  collection_interval = "10s"
}

그 후 각 호스트에서 Nomad 에이전트를 다시 로드하거나 재시작합니다. Datadog 계정으로 Nomad 메트릭이 유입됩니다.

수집한 데이터

메트릭

nomad.client.allocated.cpu
(gauge)
Amount of CPU allocated for a client.
Shown as megahertz
nomad.client.allocated.disk
(gauge)
Amount of disk allocated for a client.
nomad.client.allocated.iops
(gauge)
Number of iops allocated for a client.
Shown as operation
nomad.client.allocated.memory
(gauge)
Amount of memory allocated for a client.
nomad.client.allocated.network
(gauge)
Bandwidth allocation for a client.
nomad.client.allocations.blocked
(gauge)
Number of allocations blocked for a client.
Shown as job
nomad.client.allocations.migrating
(gauge)
Number of allocations migrating for a client.
Shown as job
nomad.client.allocations.pending
(gauge)
Number of allocations pending for a client.
Shown as job
nomad.client.allocations.running
(gauge)
Number of allocations running for a client.
Shown as job
nomad.client.allocations.start
(gauge)
Number of allocations starting
Shown as job
nomad.client.allocations.terminal
(gauge)
Number of allocations terminated for a client.
Shown as job
nomad.client.allocs.cpu.allocated
(gauge)
Total CPU resources allocated by the task across all cores.
Shown as megahertz
nomad.client.allocs.cpu.system
(gauge)
Total CPU resources consumed by the task in system space.
Shown as percent
nomad.client.allocs.cpu.throttled_periods
(gauge)
Total number of CPU periods that the task was throttled.
Shown as nanosecond
nomad.client.allocs.cpu.throttled_time
(gauge)
Total time that the task was throttled.
Shown as nanosecond
nomad.client.allocs.cpu.total_percent
(gauge)
Total CPU resources consumed by the task across all cores.
Shown as percent
nomad.client.allocs.cpu.total_ticks
(gauge)
CPU ticks consumed by the process in the last collection interval.
nomad.client.allocs.cpu.user
(gauge)
Total CPU resources consumed by the task in the user space.
Shown as percent
nomad.client.allocs.memory.allocated
(gauge)
Amount of memory allocated by the task.
Shown as byte
nomad.client.allocs.memory.cache
(gauge)
Amount of memory cached by the task.
Shown as byte
nomad.client.allocs.memory.kernel_max_usage
(gauge)
Maximum amount of memory ever used by the kernel for this task.
Shown as byte
nomad.client.allocs.memory.kernel_usage
(gauge)
Amount of memory used by the kernel for this task.
Shown as byte
nomad.client.allocs.memory.max_usage
(gauge)
Maximum amount of memory ever used by the task.
Shown as byte
nomad.client.allocs.memory.rss
(gauge)
Amount of RSS memory consumed by the task.
Shown as byte
nomad.client.allocs.memory.swap
(gauge)
Amount of memory swapped by the task.
Shown as byte
nomad.client.allocs.memory.usage
(gauge)
Total amount of memory used by the task.
Shown as byte
nomad.client.allocs.oom_killed
(gauge)
Number of allocations OOM killed.
nomad.client.consul.check_registrations
(gauge)
Number of consul check registrations.
nomad.client.consul.checks
(gauge)
Number of consul checks.
nomad.client.consul.script_checks
(gauge)
Number of consul script checks.
nomad.client.consul.service_registrations
(gauge)
Number of consul service registration.
nomad.client.consul.services
(gauge)
Number of consul services.
nomad.client.host.cpu.idle
(gauge)
Amount of CPU idle for a client.
Shown as percent
nomad.client.host.cpu.system
(gauge)
Amount of CPU consumed by the system for a client.
Shown as percent
nomad.client.host.cpu.total
(gauge)
Amount of CPU total for a client.
Shown as percent
nomad.client.host.cpu.user
(gauge)
Amount of CPU total for a user.
Shown as percent
nomad.client.host.disk.available
(gauge)
Disk available for a particular client.
Shown as byte
nomad.client.host.disk.inodes_percent
(gauge)
Disk nodes used as a percentage for a particular client.
Shown as percent
nomad.client.host.disk.size
(gauge)
Disk size for a particular client.
Shown as byte
nomad.client.host.disk.used
(gauge)
Disk used for a particular client.
Shown as byte
nomad.client.host.disk.used_percent
(gauge)
Disk used as a percentage for a particular client.
Shown as percent
nomad.client.host.memory.available
(gauge)
Amount of memory available for a client.
Shown as byte
nomad.client.host.memory.free
(gauge)
Amount of memory free for a client.
Shown as byte
nomad.client.host.memory.total
(gauge)
Total amount of memory for a client.
Shown as byte
nomad.client.host.memory.used
(gauge)
Amount of memory used for a client.
Shown as byte
nomad.client.unallocated.cpu
(gauge)
Amount of unallocated CPU for a client.
Shown as megahertz
nomad.client.unallocated.disk
(gauge)
Amount of unallocated disk for a client.
nomad.client.unallocated.iops
(gauge)
Number of unallocated iops for a client.
Shown as operation
nomad.client.unallocated.memory
(gauge)
Amount of unallocated memory for a client.
nomad.client.unallocated.network
(gauge)
Unallocated bandwidth for a client.
nomad.client.uptime
(gauge)
Uptime of the host running the Nomad client.
Shown as second
nomad.memberlist.gossip.95percentile
(gauge)
95 percentile of members in the gossip pool.
Shown as resource
nomad.memberlist.gossip.avg
(gauge)
Average number of members in the gossip pool.
Shown as resource
nomad.memberlist.gossip.count
(rate)
Number of members in the gossip pool.
Shown as resource
nomad.memberlist.gossip.max
(gauge)
Maximum number of members in the gossip pool.
Shown as resource
nomad.memberlist.gossip.median
(gauge)
Median number of members in the gossip pool.
Shown as resource
nomad.memberlist.msg.alive
(gauge)
Number of message from alive members.
Shown as resource
nomad.memberlist.tcp.accept
(gauge)
Number of accepted TCP connections.
Shown as resource
nomad.nomad.blocked_evals.total_blocked
(gauge)
Number of blocked evaluation.
Shown as job
nomad.nomad.blocked_evals.total_escaped
(gauge)
Number of blocked evaluation that are escaped.
Shown as job
nomad.nomad.blocked_evals.total_quota_limit
(gauge)
Limit quota of evaluations.
Shown as job
nomad.nomad.broker.total_blocked
(gauge)
Evaluations that are blocked until an existing evaluation for the same job completes.
nomad.nomad.broker.total_ready
(gauge)
Number of evaluations ready to be processed.
nomad.nomad.broker.total_unacked
(gauge)
Evaluations dispatched for processing but incomplete.
nomad.nomad.broker.total_waiting
(gauge)
Number of evaluations waiting to be processed.
nomad.nomad.client.get_client_allocs.95percentile
(gauge)
The 95 percentile of nomad client allocated.
nomad.nomad.client.get_client_allocs.avg
(gauge)
The average number of nomad client allocated.
nomad.nomad.client.get_client_allocs.count
(gauge)
The number of nomad client allocated.
nomad.nomad.client.get_client_allocs.max
(gauge)
The maximum number of nomad client allocated.
nomad.nomad.job_summary.complete
(gauge)
Total CPU resources consumed by the task in the user space.
nomad.nomad.job_summary.failed
(gauge)
Number of failed allocations for a job.
nomad.nomad.job_summary.lost
(gauge)
Number of lost allocations for a job.
nomad.nomad.job_summary.queued
(gauge)
Number of queued allocations for a job.
nomad.nomad.job_summary.running
(gauge)
Number of running allocations for a job.
nomad.nomad.job_summary.starting
(gauge)
Number of starting allocations for a job.
nomad.nomad.job_status.dead
(gauge)
Number of dead jobs.
Shown as job
nomad.nomad.job_status.pending
(gauge)
Number of pending jobs.
Shown as job
nomad.nomad.job_status.running
(gauge)
Number of running jobs.
Shown as job
nomad.nomad.acl.bootstrap
(gauge)
Time elapsed for ACL.Bootstrap RPC call.
Shown as nanosecond
nomad.nomad.acl.delete_policies
(gauge)
Time elapsed for ACL.DeletePolicies RPC call.
Shown as nanosecond
nomad.nomad.acl.delete_tokens
(gauge)
Time elapsed for ACL.DeleteTokens RPC call.
Shown as nanosecond
nomad.nomad.acl.get_policies
(gauge)
Time elapsed for ACL.GetPolicies RPC call.
Shown as nanosecond
nomad.nomad.acl.get_policy
(gauge)
Time elapsed for ACL.GetPolicy RPC call.
Shown as nanosecond
nomad.nomad.acl.get_token
(gauge)
Time elapsed for ACL.GetToken RPC call.
Shown as nanosecond
nomad.nomad.acl.get_tokens
(gauge)
Time elapsed for ACL.GetTokens RPC call.
Shown as nanosecond
nomad.nomad.acl.list_policies
(gauge)
Time elapsed for ACL.ListPolicies RPC call.
Shown as nanosecond
nomad.nomad.acl.list_tokens
(gauge)
Time elapsed for ACL.ListTokens RPC call.
Shown as nanosecond
nomad.nomad.rpc.query
(gauge)
Number of RPC queries.
nomad.nomad.rpc.request
(gauge)
Number of RPC requests being handled.
nomad.runtime.alloc_bytes
(gauge)
Memory utilization.
Shown as byte
nomad.runtime.free_count
(gauge)
Count of objects freed from heap by go runtime GC.
nomad.runtime.gc_pause_ns
(gauge)
Go runtime GC pause times.
Shown as nanosecond
nomad.runtime.heap_objects
(gauge)
Number of objects on the heap. General memory pressure indicator.
nomad.runtime.num_goroutines
(gauge)
Number of goroutines and general load pressure indicator.
nomad.runtime.sys_bytes
(gauge)
Go runtime GC metadata size.
Shown as byte
nomad.runtime.total_gc_pause_ns
(gauge)
Total elapsed go runtime GC pause times.
Shown as nanosecond
nomad.runtime.total_gc_runs
(gauge)
Count of go runtime GC runs.

이벤트

Nomad 점검은 이벤트를 포함하지 않습니다.

서비스 점검

Nomad 점검은 서비스 점검을 포함하지 않습니다.

트러블슈팅

도움이 필요하신가요? Datadog 지원 팀에 문의하세요.

PREVIEWING: aliciascott/DOCS-9725-Cloudcraft