- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
Collect Etcd metrics to:
The Etcd check is included in the Datadog Agent package, so you don’t need to install anything else on your Etcd instance(s).
To configure this check for an Agent running on a host:
etcd.d/conf.yaml
file, in the conf.d/
folder at the root of your Agent’s configuration directory to start collecting your Etcd performance data. See the sample etcd.d/conf.yaml for all available configuration options.Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml
file:
logs_enabled: true
Uncomment and edit this configuration block at the bottom of your etcd.d/conf.yaml
:
logs:
- type: file
path: "<LOG_FILE_PATH>"
source: etcd
service: "<SERVICE_NAME>"
Change the path
and service
parameter values based on your environment. See the sample etcd.d/conf.yaml for all available configuration options.
For containerized environments, see the Autodiscovery Integration Templates for guidance on applying the parameters below.
Parameter | Value |
---|---|
<INTEGRATION_NAME> | etcd |
<INIT_CONFIG> | blank or {} |
<INSTANCE_CONFIG> | {"prometheus_url": "http://%%host%%:2379/metrics"} |
Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes log collection.
Parameter | Value |
---|---|
<LOG_CONFIG> | {"source": "etcd", "service": "<SERVICE_NAME>"} |
Run the Agent’s status
subcommand and look for etcd
under the Checks section.
etcd.debugging.mvcc.db.compaction.keys.total (count) | Total number of db keys compacted. Shown as key |
etcd.debugging.mvcc.db.compaction.pause.duration.milliseconds (gauge) | Bucketed histogram of db compaction pause duration. Shown as millisecond |
etcd.debugging.mvcc.db.compaction.total.duration.milliseconds (gauge) | Bucketed histogram of db compaction total duration. Shown as millisecond |
etcd.debugging.mvcc.db.total.size.in_bytes (gauge) | Total size of the underlying database in bytes. Shown as byte |
etcd.debugging.mvcc.delete.total (count) | Total number of deletes seen by this member. Shown as query |
etcd.debugging.mvcc.events.total (count) | Total number of events sent by this member. Shown as event |
etcd.debugging.mvcc.index.compaction.pause.duration.milliseconds (gauge) | Bucketed histogram of index compaction pause duration. Shown as millisecond |
etcd.debugging.mvcc.keys.total (gauge) | Total number of keys. Shown as key |
etcd.debugging.mvcc.pending.events.total (gauge) | Total number of pending events to be sent. Shown as event |
etcd.debugging.mvcc.put.total (count) | Total number of puts seen by this member. Shown as query |
etcd.debugging.mvcc.range.total (count) | Total number of ranges seen by this member. Shown as query |
etcd.debugging.mvcc.slow_watcher.total (gauge) | Total number of unsynced slow watchers. Shown as connection |
etcd.debugging.mvcc.txn.total (count) | Total number of txns seen by this member. Shown as transaction |
etcd.debugging.mvcc.watch_stream.total (gauge) | Total number of watch streams. Shown as connection |
etcd.debugging.mvcc.watcher.total (gauge) | Total number of watchers. Shown as connection |
etcd.debugging.server.lease.expired.total (count) | The total number of expired leases. Shown as item |
etcd.debugging.snap.save.marshalling.duration.seconds (gauge) | The marshalling cost distributions of save called by snapshot. Shown as second |
etcd.debugging.snap.save.total.duration.seconds (gauge) | The total latency distributions of save called by snapshot. Shown as second |
etcd.debugging.store.expires.total (count) | Total number of expired keys. Shown as key |
etcd.debugging.store.reads.total (count) | Total number of reads action by (get/getRecursive), local to this member. Shown as read |
etcd.debugging.store.watch.requests.total (count) | Total number of incoming watch requests (new or reestablished). Shown as request |
etcd.debugging.store.watchers (gauge) | Count of currently active watchers. Shown as connection |
etcd.debugging.store.writes.total (count) | Total number of writes (e.g. set/compareAndDelete) seen by this member. Shown as write |
etcd.disk.backend.commit.duration.seconds (gauge) | The latency distributions of commit called by backend. Shown as second |
etcd.disk.backend.snapshot.duration.seconds (gauge) | The latency distribution of backend snapshots. Shown as second |
etcd.disk.wal.fsync.duration.seconds.count (count) | The count of latency distributions of fsync called by wal. Shown as second |
etcd.disk.wal.fsync.duration.seconds.sum (gauge) | The sum of latency distributions of fsync called by wal. Shown as second |
etcd.disk.wal.write.bytes.total (gauge) | Total number of bytes written in WAL Shown as byte |
etcd.etcd.server.client.requests.total (count) | The total number of client requests per client version Shown as request |
etcd.go.gc.duration.seconds (gauge) | A summary of the GC invocation durations. Shown as second |
etcd.go.goroutines (gauge) | Number of goroutines that currently exist. Shown as thread |
etcd.go.info (gauge) | Information about the Go environment. Shown as item |
etcd.go.memstats.alloc.bytes (gauge) | Number of bytes allocated and still in use. Shown as byte |
etcd.go.memstats.alloc.bytes.total (count) | Total number of bytes allocated, even if freed. Shown as byte |
etcd.go.memstats.buck.hash.sys.bytes (gauge) | Number of bytes used by the profiling bucket hash table. Shown as byte |
etcd.go.memstats.frees.total (count) | Total number of frees. Shown as occurrence |
etcd.go.memstats.gc.cpu.fraction (gauge) | The fraction of this program's available CPU time used by the GC since the program started. Shown as cpu |
etcd.go.memstats.gc.sys.bytes (gauge) | Number of bytes used for garbage collection system metadata. Shown as byte |
etcd.go.memstats.heap.alloc.bytes (gauge) | Number of heap bytes allocated and still in use. Shown as byte |
etcd.go.memstats.heap.idle.bytes (gauge) | Number of heap bytes waiting to be used. Shown as byte |
etcd.go.memstats.heap.inuse.bytes (gauge) | Number of heap bytes that are in use. Shown as byte |
etcd.go.memstats.heap.objects (gauge) | Number of allocated objects. Shown as item |
etcd.go.memstats.heap.released.bytes (gauge) | Number of heap bytes released to OS. Shown as byte |
etcd.go.memstats.heap.sys.bytes (gauge) | Number of heap bytes obtained from system. Shown as byte |
etcd.go.memstats.last.gc.time.seconds (gauge) | Number of seconds since 1970 of last garbage collection. Shown as second |
etcd.go.memstats.lookups.total (count) | Total number of pointer lookups. Shown as occurrence |
etcd.go.memstats.mallocs.total (count) | Total number of mallocs. Shown as occurrence |
etcd.go.memstats.mcache.inuse.bytes (gauge) | Number of bytes in use by mcache structures. Shown as byte |
etcd.go.memstats.mcache.sys.bytes (gauge) | Number of bytes used for mcache structures obtained from system. Shown as byte |
etcd.go.memstats.mspan.inuse.bytes (gauge) | Number of bytes in use by mspan structures. Shown as byte |
etcd.go.memstats.mspan.sys.bytes (gauge) | Number of bytes used for mspan structures obtained from system. Shown as byte |
etcd.go.memstats.next.gc.bytes (gauge) | Number of heap bytes when next garbage collection will take place. Shown as byte |
etcd.go.memstats.other.sys.bytes (gauge) | Number of bytes used for other system allocations. Shown as byte |
etcd.go.memstats.stack.inuse.bytes (gauge) | Number of bytes in use by the stack allocator. Shown as byte |
etcd.go.memstats.stack.sys.bytes (gauge) | Number of bytes obtained from system for stack allocator. Shown as byte |
etcd.go.memstats.sys.bytes (gauge) | Number of bytes obtained from system. Shown as byte |
etcd.go.threads (gauge) | Number of OS threads created. Shown as thread |
etcd.grpc.proxy.cache.hits.total (gauge) | Total number of cache hits Shown as occurrence |
etcd.grpc.proxy.cache.keys.total (gauge) | Total number of keys/ranges cached Shown as item |
etcd.grpc.proxy.cache.misses.total (gauge) | Total number of cache misses Shown as occurrence |
etcd.grpc.proxy.events.coalescing.total (count) | Total number of events coalescing Shown as event |
etcd.grpc.proxy.watchers.coalescing.total (gauge) | Total number of current watchers coalescing Shown as connection |
etcd.grpc.server.handled.total (count) | Total number of RPCs completed on the server, regardless of success or failure. Shown as operation |
etcd.grpc.server.msg.received.total (count) | Total number of RPC stream messages received on the server. Shown as operation |
etcd.grpc.server.msg.sent.total (count) | Total number of gRPC stream messages sent by the server. Shown as operation |
etcd.grpc.server.started.total (count) | Total number of RPCs started on the server. Shown as operation |
etcd.leader.counts.fail (gauge) | Rate of failed Raft RPC requests (ETCD API V2 only) Shown as request |
etcd.leader.counts.success (gauge) | Rate of successful Raft RPC requests (ETCD API V2 only) Shown as request |
etcd.leader.latency.avg (gauge) | Average latency to each peer in the cluster (ETCD API V2 only) Shown as millisecond |
etcd.leader.latency.current (gauge) | Current latency to each peer in the cluster (ETCD API V2 only) Shown as millisecond |
etcd.leader.latency.max (gauge) | Maximum latency to each peer in the cluster (ETCD API V2 only) Shown as millisecond |
etcd.leader.latency.min (gauge) | Minimum latency to each peer in the cluster (ETCD API V2 only) Shown as millisecond |
etcd.leader.latency.stddev (gauge) | Standard deviation latency to each peer in the cluster (ETCD API V2 only) Shown as millisecond |
etcd.mvcc.db.total.size.in_use.bytes (gauge) | Total size of the underlying database logically in use Shown as byte |
etcd.network.active_peers (gauge) | The current number of active peer connections Shown as connection |
etcd.network.client.grpc.received.bytes.total (count) | The total number of bytes received from grpc clients. Shown as byte |
etcd.network.client.grpc.sent.bytes.total (count) | The total number of bytes sent to grpc clients. Shown as byte |
etcd.network.disconnected_peers.total (count) | The total number of disconnected peers Shown as connection |
etcd.network.peer.received.bytes.total (count) | The total number of bytes received from peers. Shown as byte |
etcd.network.peer.received.failures.total (count) | The total number of receive failures from peers Shown as event |
etcd.network.peer.round_trip_time.seconds (gauge) | Round-Trip-Time histogram between peers. Shown as second |
etcd.network.peer.sent.bytes.total (count) | The total number of bytes sent to peers. Shown as byte |
etcd.network.peer.sent.failures.total (count) | The total number of send failures from peers Shown as event |
etcd.network.snapshot.receive.failures.total (count) | Total number of snapshot receive failures Shown as event |
etcd.network.snapshot.receive.inflights.total (gauge) | Total number of inflight snapshot sends Shown as event |
etcd.network.snapshot.receive.success.total (count) | Total number of successful snapshot receives Shown as event |
etcd.network.snapshot.receive.total.duration.seconds.count (gauge) | Total latency distributions of v3 snapshot receives Shown as second |
etcd.network.snapshot.receive.total.duration.seconds.sum (gauge) | Total latency distributions of v3 snapshot receives Shown as second |
etcd.network.snapshot.send.failures.total (count) | The total number of send failures from peers Shown as event |
etcd.network.snapshot.send.inflights.total (gauge) | Total number of inflight snapshot receives Shown as event |
etcd.network.snapshot.send.sucess.total (count) | Total number of successful snapshot sends Shown as event |
etcd.network.snapshot.send.total.duration.seconds.count (gauge) | Total latency distributions of v3 snapshot sends Shown as second |
etcd.network.snapshot.send.total.duration.seconds.sum (gauge) | Total latency distributions of v3 snapshot sends Shown as second |
etcd.os.fd.limit (gauge) | The file descriptor limit Shown as object |
etcd.os.fd.used (gauge) | The number of used file descriptors Shown as object |
etcd.process.cpu.seconds.total (count) | Total user and system CPU time spent in seconds. Shown as cpu |
etcd.process.max.fds (gauge) | Maximum number of open file descriptors. Shown as item |
etcd.process.open.fds (gauge) | Number of open file descriptors. Shown as item |
etcd.process.resident.memory.bytes (gauge) | Resident memory size in bytes. Shown as byte |
etcd.process.start.time.seconds (gauge) | Start time of the process since unix epoch in seconds. Shown as second |
etcd.process.virtual.memory.bytes (gauge) | Virtual memory size in bytes. Shown as byte |
etcd.self.recv.appendrequest.count (gauge) | Rate of append requests this node has processed (ETCD API V2 only) Shown as request |
etcd.self.recv.bandwidthrate (gauge) | Rate of bytes received (ETCD API V2 only) Shown as byte |
etcd.self.recv.pkgrate (gauge) | Rate of packets received (ETCD API V2 only) Shown as packet |
etcd.self.send.appendrequest.count (gauge) | Rate of append requests this node has sent (ETCD API V2 only) Shown as request |
etcd.self.send.bandwidthrate (gauge) | Rate of bytes sent (ETCD API V2 only) Shown as byte |
etcd.self.send.pkgrate (gauge) | Rate of packets sent (ETCD API V2 only) Shown as packet |
etcd.server.apply.slow.total (count) | The total number of slow apply requests (likely overloaded from slow disk) Shown as request |
etcd.server.go_version (gauge) | Which Go version server is running with. 1 with label with current version Shown as unit |
etcd.server.has_leader (gauge) | Whether or not a leader exists. 1 is existence, 0 is not. Shown as check |
etcd.server.health.failures.total (count) | The total number of failed health checks Shown as event |
etcd.server.health.success.total (count) | The total number of successful health checks Shown as event |
etcd.server.heartbeat.send.failures.total (count) | The total number of leader heartbeat send failures (likely overloaded from slow disk) Shown as event |
etcd.server.is_leader (gauge) | Whether or not this member is a leader. 1 if is, 0 otherwise. Shown as check |
etcd.server.leader.changes.seen.total (count) | The number of leader changes seen. Shown as event |
etcd.server.lease.expired.total (count) | The total number of expired leases Shown as occurrence |
etcd.server.proposals.applied.total (gauge) | The total number of consensus proposals applied. Shown as occurrence |
etcd.server.proposals.committed.total (gauge) | The total number of consensus proposals committed. Shown as occurrence |
etcd.server.proposals.failed.total (count) | The total number of failed proposals seen. Shown as occurrence |
etcd.server.proposals.pending (gauge) | The current number of pending proposals to commit. Shown as occurrence |
etcd.server.quota.backend.bytes (gauge) | Current backend storage quota size in bytes Shown as byte |
etcd.server.read_indexes.failed.total (count) | The total number of failed read indexes seen Shown as event |
etcd.server.read_indexes.slow.total (count) | The total number of pending read indexes not in sync with leader or timed out read index requests Shown as event |
etcd.server.version (gauge) | Which version is running. 1 for 'server_version' label with current version. Shown as item |
etcd.snap.db.fsync.duration.seconds.count (gauge) | The latency distributions of fsyncing .snap.db file Shown as second |
etcd.snap.db.fsync.duration.seconds.sum (gauge) | The latency distributions of fsyncing .snap.db file Shown as second |
etcd.snap.db.save.total.duration.seconds.count (gauge) | The total latency distributions of v3 snapshot save Shown as second |
etcd.snap.db.save.total.duration.seconds.sum (gauge) | The total latency distributions of v3 snapshot save Shown as second |
etcd.snap.fsync.duration.seconds.count (gauge) | The latency distributions of fsync called by snap Shown as second |
etcd.snap.fsync.duration.seconds.sum (gauge) | The latency distributions of fsync called by snap Shown as second |
etcd.store.compareanddelete.fail (gauge) | Rate of compare and delete requests failure (ETCD API V2 only) Shown as request |
etcd.store.compareanddelete.success (gauge) | Rate of compare and delete requests success (ETCD API V2 only) Shown as request |
etcd.store.compareandswap.fail (gauge) | Rate of compare and swap requests failure (ETCD API V2 only) Shown as request |
etcd.store.compareandswap.success (gauge) | Rate of compare and swap requests success (ETCD API V2 only) Shown as request |
etcd.store.create.fail (gauge) | Rate of failed create requests (ETCD API V2 only) Shown as request |
etcd.store.create.success (gauge) | Rate of successful create requests (ETCD API V2 only) Shown as request |
etcd.store.delete.fail (gauge) | Rate of failed delete requests (ETCD API V2 only) Shown as request |
etcd.store.delete.success (gauge) | Rate of successful delete requests (ETCD API V2 only) Shown as request |
etcd.store.expire.count (gauge) | Rate of expired keys (ETCD API V2 only) Shown as eviction |
etcd.store.gets.fail (gauge) | Rate of failed get requests (ETCD API V2 only) Shown as request |
etcd.store.gets.success (gauge) | Rate of successful get requests (ETCD API V2 only) Shown as request |
etcd.store.sets.fail (gauge) | Rate of failed set requests (ETCD API V2 only) Shown as request |
etcd.store.sets.success (gauge) | Rate of successful set requests (ETCD API V2 only) Shown as request |
etcd.store.update.fail (gauge) | Rate of failed update requests (ETCD API V2 only) Shown as request |
etcd.store.update.success (gauge) | Rate of successful update requests (ETCD API V2 only) Shown as request |
etcd.store.watchers (gauge) | Rate of watchers(ETCD API V2 only) |
Etcd metrics are tagged with etcd_state:leader
or etcd_state:follower
, depending on the node status, so you can easily aggregate metrics by status.
The Etcd check does not include any events.
etcd.can_connect
Returns CRITICAL
if unable to get metrics from etcd (timeout or non-200 HTTP code). This service check is only available on the legacy version of the etcd check.
Statuses: ok, critical
etcd.healthy
Returns CRITICAL
when a member is unhealthy. This service check is only available on the legacy version of the etcd check.
Statuses: ok, critical, unknown
etcd.prometheus.health
Returns CRITICAL
if the check cannot access a metrics endpoint. Otherwise, returns OK
. This service check is only available when use_preview
is enabled.
Statuses: ok, critical
Need help? Contact Datadog support.