Consul

Supported OS Linux Windows Mac OS

통합 버전2.6.1

Consul Dash

개요

Datadog Agent는 다음을 포함하여 Consul 노드에서 많은 메트릭을 수집합니다.

  • Consul 피어 총개수
  • 서비스 상태 - 특정 서비스에 대해 UP, PASSING, WARNING, CRITICAL인 노드 수
  • 노드 상태 - 특정 노드에 대해 UP, PASSING, WARNING, CRITICAL인 서비스 수
  • 네트워크 좌표 - 데이터 센터 간 및 데이터 센터 내 대기 시간

Consul Agent는 DogStatsD를 통해 추가 메트릭을 제공할 수 있습니다. 이러한 메트릭은 Consul에 의존하는 서비스가 아닌 Consul 자체의 내부 상태와 더 관련이 있습니다. 메트릭은 다음과 같습니다.

  • Serf 이벤트 및 멤버 플랩
  • Raft 프로토콜
  • DNS 성능

이외에 다수가 있습니다.

Datadog Agent는 메트릭 외에도 Consul의 각 상태 점검마다 서비스 점검을 전송하고, 새 리더 선출마다 이벤트를 전송합니다.

설정

설치

Datadog Agent의 Consul 점검은 Datadog Agent 패키지에 포함되어 있으므로 Consul 노드에 다른 것을 설치할 필요가 없습니다.

구성

호스트

호스트에서 실행 중인 에이전트에 대해 이 점검을 구성하려면:

메트릭 수집
  1. Consul 메트릭 수집을 시작하려면 Agent의 설정 디렉터리 루트의 conf.d/ 폴더에서 consul.d/conf.yaml 파일을 편집합니다. 사용 가능한 모든 설정 옵션은 샘플 consul.d/conf.yaml을 참조하세요.

    init_config:
    
    instances:
      ## @param url - string - required
      ## Where your Consul HTTP server lives,
      ## point the URL at the leader to get metrics about your Consul cluster.
      ## Use HTTPS instead of HTTP if your Consul setup is configured to do so.
      #
      - url: http://localhost:8500
    
  2. 에이전트를 재시작합니다.

OpenMetrics

(선택 사항) use_prometheus_endpoint 설정 옵션을 활성화하여 Consul Prometheus 엔드포인트에서 추가 메트릭 세트를 가져올 수 있습니다.

참고: 동일한 인스턴스에 대해 두 가지를 모두 활성화하지 말고 DogStatsD 또는 Prometheus 메서드를 사용하세요.

  1. Prometheus 엔드포인트에 메트릭을 노출하도록 Consul을 설정합니다. 기본 Consul 설정 파일의 최상위 telemetry 키에 중첩된 prometheus_retention_time을 설정합니다.

    {
      ...
      "telemetry": {
        "prometheus_retention_time": "360h"
      },
      ...
    }
    
  2. Prometheus 엔드포인트 사용을 시작하려면 Agent의 설정 디렉터리 루트에 있는 conf.d/ 폴더에서 consul.d/conf.yaml 파일을 편집합니다.

    instances:
        - url: <EXAMPLE>
          use_prometheus_endpoint: true
    
  3. 에이전트를 재시작합니다.

DogStatsD

Prometheus 엔드포인트를 사용하는 대신 DogStatsD를 통해 동일한 추가 메트릭 세트를 Agent에 보내도록 Consul을 설정할 수 있습니다.

  1. DogStatsD 메트릭을 보내도록 Consul을 설정하려면 Consul의 기본 설정 파일에서 최상위 telemetry 키 아래에 중첩된 dogstatsd_addr을 추가합니다.

    {
      ...
      "telemetry": {
        "dogstatsd_addr": "127.0.0.1:8125"
      },
      ...
    }
    
  2. 메트릭에 태그가 올바르게 지정되었는지 확인하려면 다음 설정을 추가하여 Datadog Agent 기본 설정 파일 datadog.yaml을 업데이트합니다.

    # dogstatsd_mapper_cache_size: 1000  # default to 1000
    dogstatsd_mapper_profiles:
      - name: consul
        prefix: "consul."
        mappings:
          - match: 'consul\.http\.([a-zA-Z]+)\.(.*)'
            match_type: "regex"
            name: "consul.http.request"
            tags:
              method: "$1"
              path: "$2"
          - match: 'consul\.raft\.replication\.appendEntries\.logs\.([0-9a-f-]+)'
            match_type: "regex"
            name: "consul.raft.replication.appendEntries.logs"
            tags:
              peer_id: "$1"
          - match: 'consul\.raft\.replication\.appendEntries\.rpc\.([0-9a-f-]+)'
            match_type: "regex"
            name: "consul.raft.replication.appendEntries.rpc"
            tags:
              peer_id: "$1"
          - match: 'consul\.raft\.replication\.heartbeat\.([0-9a-f-]+)'
            match_type: "regex"
            name: "consul.raft.replication.heartbeat"
            tags:
              peer_id: "$1"
    
  3. 에이전트를 재시작합니다.

로그 수집

Agent 버전 6.0 이상에서 사용 가능

  1. 로그 수집은 Datadog Agent에서 기본적으로 비활성화되어 있습니다. datadog.yaml에서 다음을 사용하여 활성화합니다.

    logs_enabled: true
    
  2. Consul 로그를 수집하려면 consul.yaml 파일에서 이 설정 블록을 편집합니다.

    logs:
      - type: file
        path: /var/log/consul_server.log
        source: consul
        service: myservice
    

    pathservice 파라미터 값을 변경하고 환경에 맞게 구성합니다. 사용 가능한 모든 설정 옵션은 샘플 consul.d/conf.yaml을 참조하세요.

  3. 에이전트를 재시작합니다.

컨테이너화

컨테이너화된 환경의 경우 자동탐지 통합 템플릿에 다음 파라미터를 적용하는 방법이 안내되어 있습니다.

메트릭 수집
파라미터
<INTEGRATION_NAME>consul
<INIT_CONFIG>비어 있음 또는 {}
<INSTANCE_CONFIG>{"url": "https://%%host%%:8500"}
로그 수집

Agent 버전 6.0 이상에서 사용 가능

Datadog 에이전트에서 로그 수집은 기본값으로 비활성화되어 있습니다. 이를 활성화하려면 쿠버네티스(Kubernetes) 로그 수집을 참조하세요.

파라미터
<LOG_CONFIG>{"source": "consul", "service": "<SERVICE_NAME>"}

검증

Agent의 상태 하위 명령을 실행하고 Checks 섹션에서 consul을 찾으세요.

참고: Consul 노드에 디버그 로깅이 활성화된 경우 Datadog Agent의 정기적인 폴링이 Consul 로그에 표시됩니다.

2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/status/leader (59.344us) from=127.0.0.1:53768
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/status/peers (62.678us) from=127.0.0.1:53770
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/health/state/any (106.725us) from=127.0.0.1:53772
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/catalog/services (79.657us) from=127.0.0.1:53774
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/health/service/consul (153.917us) from=127.0.0.1:53776
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/coordinate/datacenters (71.778us) from=127.0.0.1:53778
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/coordinate/nodes (84.95us) from=127.0.0.1:53780

Consul Agent에서 DogStatsD까지

netstat를 사용해 Consul 메트릭도 전송되고 있는지 확인합니다.

$ sudo netstat -nup | grep "127.0.0.1:8125.*ESTABLISHED"
udp        0      0 127.0.0.1:53874         127.0.0.1:8125          ESTABLISHED 23176/consul

수집한 데이터

메트릭

consul.catalog.nodes_critical
(gauge)
[Integration] The number of nodes with service status critical from those registered
Shown as node
consul.catalog.nodes_passing
(gauge)
[Integration] The number of nodes with service status passing from those registered
Shown as node
consul.catalog.nodes_up
(gauge)
[Integration] The number of nodes
Shown as node
consul.catalog.nodes_warning
(gauge)
[Integration] The number of nodes with service status warning from those registered
Shown as node
consul.catalog.services_count
(gauge)
[Integration] Metrics to count the number of services matching criteria like the service tag, node name, or status. To be queried using the sum by aggregator.
Shown as service
consul.catalog.services_critical
(gauge)
[Integration] Total critical services on nodes
Shown as service
consul.catalog.services_passing
(gauge)
[Integration] Total passing services on nodes
Shown as service
consul.catalog.services_up
(gauge)
[Integration] Total services registered on nodes
Shown as service
consul.catalog.services_warning
(gauge)
[Integration] Total warning services on nodes
Shown as service
consul.catalog.total_nodes
(gauge)
[Integration] The number of nodes registered in the consul cluster
Shown as node
consul.client.rpc
(count)
[DogStatsD] [Prometheus] This increments whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.
Shown as request
consul.client.rpc.failed
(count)
[DogStatsD] [Prometheus] Increments whenever a Consul agent in client mode makes an RPC request to a Consul server and fails
Shown as request
consul.http.request
(gauge)
[DogStatsD] Tracks how long it takes to service the given HTTP request for the given verb and path. Using a DogStatsD mapper as described in the README, the paths are mapped to tags and do not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: http_method:GET, path:v1.kv._)
Shown as millisecond
consul.http.request.count
(count)
[Prometheus] A count of how long it takes to service the given HTTP request for the given verb and path. It includes labels for path and method. Path does not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: path=v1.kv._)
Shown as millisecond
consul.http.request.quantile
(gauge)
[Prometheus] A quantile of how long it takes to service the given HTTP request for the given verb and path. Includes labels for path and method. Path does not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: path=v1.kv._)
Shown as millisecond
consul.http.request.sum
(count)
[Prometheus] The sum of how long it takes to service the given HTTP request for the given verb and path. Includes labels for path and method. Path does not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: path=v1.kv._)
Shown as millisecond
consul.memberlist.degraded.probe
(gauge)
[DogStatsD] [Prometheus] This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.
consul.memberlist.gossip.95percentile
(gauge)
[DogStatsD] The p95 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.avg
(gauge)
[DogStatsD] The avg for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.memberlist.gossip
consul.memberlist.gossip.max
(gauge)
[DogStatsD] The max for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.median
(gauge)
[DogStatsD] The median for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.quantile
(gauge)
[Prometheus] The quantile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.sum
(count)
[DogStatsD] [Prometheus] The sum of the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.health.score
(gauge)
[DogStatsD] [Prometheus] This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy". For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf
consul.memberlist.msg.alive
(count)
[DogStatsD] [Prometheus] This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.
consul.memberlist.msg.dead
(count)
[DogStatsD] [Prometheus] This metric counts the number of times a Consul agent has marked another agent to be a dead node.
Shown as message
consul.memberlist.msg.suspect
(count)
[DogStatsD] [Prometheus] The number of times a Consul agent suspects another as failed while probing during gossip protocol
consul.memberlist.probenode.95percentile
(gauge)
[DogStatsD] The p95 for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.avg
(gauge)
[DogStatsD] The avg for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.memberlist.probenode
consul.memberlist.probenode.max
(gauge)
[DogStatsD] The max for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.median
(gauge)
[DogStatsD] The median for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.quantile
(gauge)
[Prometheus] The quantile for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.sum
(count)
[DogStatsD] [Prometheus] The sum for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.pushpullnode.95percentile
(gauge)
[DogStatsD] The p95 for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.avg
(gauge)
[DogStatsD] The avg for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.memberlist.pushpullnode
consul.memberlist.pushpullnode.max
(gauge)
[DogStatsD] The max for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.median
(gauge)
[DogStatsD] The median for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.quantile
(gauge)
[Prometheus] The quantile for the number of Consul agents that have exchanged state with this agent.
consul.memberlist.pushpullnode.sum
(count)
[DogStatsD] [Prometheus] The sum for the number of Consul agents that have exchanged state with this agent.
consul.memberlist.tcp.accept
(count)
[DogStatsD] [Prometheus] This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection.
Shown as connection
consul.memberlist.tcp.connect
(count)
[DogStatsD] [Prometheus] This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent.
Shown as connection
consul.memberlist.tcp.sent
(count)
[DogStatsD] [Prometheus] This metric measures the total number of bytes sent by a Consul agent through the TCP protocol
Shown as byte
consul.memberlist.udp.received
(count)
[DogStatsD] [Prometheus] This metric measures the total number of bytes sent/received by a Consul agent through the UDP protocol.
Shown as byte
consul.memberlist.udp.sent
(count)
[DogStatsD] [Prometheus] This metric measures the total number of bytes sent/received by a Consul agent through the UDP protocol.
Shown as byte
consul.net.node.latency.max
(gauge)
[Integration] Maximum latency from this node to all others
Shown as millisecond
consul.net.node.latency.median
(gauge)
[Integration] Median latency from this node to all others
Shown as millisecond
consul.net.node.latency.min
(gauge)
[Integration] Minimum latency from this node to all others
Shown as millisecond
consul.net.node.latency.p25
(gauge)
[Integration] P25 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p75
(gauge)
[Integration] P75 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p90
(gauge)
[Integration] P90 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p95
(gauge)
[Integration] P95 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p99
(gauge)
[Integration] P99 latency from this node to all others
Shown as millisecond
consul.peers
(gauge)
[Integration] The number of peers in the peer set
consul.raft.apply
(count)
[DogStatsD] [Prometheus] The number of raft transactions occurring
Shown as transaction
consul.raft.commitTime.95percentile
(gauge)
[DogStatsD] The p95 time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.avg
(gauge)
[DogStatsD] The average time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.count
(count)
[DogStatsD] [Prometheus] The number of samples of raft.commitTime
consul.raft.commitTime.max
(gauge)
[DogStatsD] The max time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.median
(gauge)
[DogStatsD] The median time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.quantile
(gauge)
[Prometheus] The quantile time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.sum
(count)
[DogStatsD] [Prometheus] The sum of the time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.leader.dispatchLog.95percentile
(gauge)
[DogStatsD] The p95 time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.avg
(gauge)
[DogStatsD] The average time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.count
(count)
[DogStatsD] [Prometheus] The number of samples of raft.leader.dispatchLog
consul.raft.leader.dispatchLog.max
(gauge)
[DogStatsD] The max time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.median
(gauge)
[DogStatsD] The median time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.quantile
(gauge)
[Prometheus] The quantile time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.sum
(count)
[DogStatsD] [Prometheus] The sum of the time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.lastContact.95percentile
(gauge)
[DogStatsD] The p95 time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.avg
(gauge)
[DogStatsD] The average time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.count
(count)
[DogStatsD] [Prometheus] The number of samples of raft.leader.lastContact
consul.raft.leader.lastContact.max
(gauge)
[DogStatsD] The max time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.median
(gauge)
[DogStatsD] The median time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.quantile
(gauge)
[Prometheus] The quantile time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.sum
(count)
[DogStatsD] [Prometheus] The sum of the time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.replication.appendEntries.logs
(count)
[DogStatsD] [Prometheus] Measures the number of logs replicated to an agent, to bring it up to speed with the leader's logs.
Shown as entry
consul.raft.replication.appendEntries.rpc.count
(count)
[DogStatsD] [Prometheus] The count the time taken by the append entries RFC to replicate the log entries of a leader agent onto its follower agent(s)
Shown as millisecond
consul.raft.replication.appendEntries.rpc.quantile
(gauge)
[Prometheus] The quantile of the time taken by the append entries RFC to replicate the log entries of a leader agent onto its follower agent(s)
Shown as millisecond
consul.raft.replication.appendEntries.rpc.sum
(count)
[DogStatsD] [Prometheus] The sum the time taken by the append entries RFC to replicate the log entries of a leader agent onto its follower agent(s)
Shown as millisecond
consul.raft.replication.heartbeat.count
(count)
[DogStatsD] [Prometheus] The count the time taken to invoke appendEntries on a peer.
Shown as millisecond
consul.raft.replication.heartbeat.quantile
(gauge)
[Prometheus] The quantile of the time taken to invoke appendEntries on a peer.
Shown as millisecond
consul.raft.replication.heartbeat.sum
(count)
[DogStatsD] [Prometheus] The sum of the time taken to invoke appendEntries on a peer.
Shown as millisecond
consul.raft.state.candidate
(count)
[DogStatsD] [Prometheus]The number of initiated leader elections
Shown as event
consul.raft.state.leader
(count)
[DogStatsD] [Prometheus] The number of completed leader elections
Shown as event
consul.runtime.gc_pause_ns.95percentile
(gauge)
[DogStatsD] The p95 for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.avg
(gauge)
[DogStatsD] The avg for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.runtime.gcpausens
consul.runtime.gc_pause_ns.max
(gauge)
[DogStatsD] The max for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.median
(gauge)
[DogStatsD] The median for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.quantile
(gauge)
[Prometheus] The quantile of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.sum
(count)
[DogStatsD] [Prometheus] The sum of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.serf.coordinate.adjustment_ms.95percentile
(gauge)
[DogStatsD] The p95 in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.avg
(gauge)
[DogStatsD] The avg in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.serf.coordinate.adjustment_ms
consul.serf.coordinate.adjustment_ms.max
(gauge)
[DogStatsD] The max in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.median
(gauge)
[DogStatsD] The median in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.quantile
(gauge)
[Prometheus] The quantile in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.sum
(count)
[DogStatsD] [Prometheus] The sum in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.events
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent processes a serf event
Shown as event
consul.serf.member.failed
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports.
consul.serf.member.flap
(count)
[DogStatsD] [Prometheus] The number of times a Consul agent is marked dead and then quickly recovers
consul.serf.member.join
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent processes a join event
Shown as event
consul.serf.member.left
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent leaves the cluster.
consul.serf.member.update
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent updates.
consul.serf.msgs.received.95percentile
(gauge)
[DogStatsD] The p95 for the number of serf messages received
Shown as message
consul.serf.msgs.received.avg
(gauge)
[DogStatsD] The avg for the number of serf messages received
Shown as message
consul.serf.msgs.received.count
(count)
[DogStatsD] [Prometheus] The count of serf messages received
consul.serf.msgs.received.max
(gauge)
[DogStatsD] The max for the number of serf messages received
Shown as message
consul.serf.msgs.received.median
(gauge)
[DogStatsD] The median for the number of serf messages received
Shown as message
consul.serf.msgs.received.quantile
(gauge)
[Prometheus] The quantile for the number of serf messages received
Shown as message
consul.serf.msgs.received.sum
(count)
[DogStatsD] [Prometheus] The sum for the number of serf messages received
Shown as message
consul.serf.msgs.sent.95percentile
(gauge)
[DogStatsD] The p95 for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.avg
(gauge)
[DogStatsD] The avg for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.count
(count)
[DogStatsD] [Prometheus] The count of serf messages sent
consul.serf.msgs.sent.max
(gauge)
[DogStatsD] The max for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.median
(gauge)
[DogStatsD] The median for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.quantile
(gauge)
[Prometheus] The quantile for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.sum
(count)
[DogStatsD] [Prometheus] The sum of the number of serf messages sent
Shown as message
consul.serf.queue.event.95percentile
(gauge)
[DogStatsD] The p95 for the size of the serf event queue
consul.serf.queue.event.avg
(gauge)
[DogStatsD] The avg size of the serf event queue
consul.serf.queue.event.count
(count)
[DogStatsD] [Prometheus] The number of items in the serf event queue
consul.serf.queue.event.max
(gauge)
[DogStatsD] The max size of the serf event queue
consul.serf.queue.event.median
(gauge)
[DogStatsD] The median size of the serf event queue
consul.serf.queue.event.quantile
(gauge)
[Prometheus] The quantile for the size of the serf event queue
consul.serf.queue.intent.95percentile
(gauge)
[DogStatsD] The p95 for the size of the serf intent queue
consul.serf.queue.intent.avg
(gauge)
[DogStatsD] The avg size of the serf intent queue
consul.serf.queue.intent.count
(count)
[DogStatsD] [Prometheus] The number of items in the serf intent queue
consul.serf.queue.intent.max
(gauge)
[DogStatsD] The max size of the serf intent queue
consul.serf.queue.intent.median
(gauge)
[DogStatsD] The median size of the serf intent queue
consul.serf.queue.intent.quantile
(gauge)
[Prometheus] The quantile for the size of the serf intent queue
consul.serf.queue.query.95percentile
(gauge)
[DogStatsD] The p95 for the size of the serf query queue
consul.serf.queue.query.avg
(gauge)
[DogStatsD] The avg size of the serf query queue
consul.serf.queue.query.count
(count)
[DogStatsD] [Prometheus] The number of items in the serf query queue
consul.serf.queue.query.max
(gauge)
[DogStatsD] The max size of the serf query queue
consul.serf.queue.query.median
(gauge)
[DogStatsD] The median size of the serf query queue
consul.serf.queue.query.quantile
(gauge)
[Prometheus] The quantile for the size of the serf query queue
consul.serf.snapshot.appendline.95percentile
(gauge)
[DogStatsD] The p95 of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.avg
(gauge)
[DogStatsD] The avg of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.serf.snapshot.appendline
consul.serf.snapshot.appendline.max
(gauge)
[DogStatsD] The max of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.median
(gauge)
[DogStatsD] The median of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.quantile
(gauge)
[Prometheus] The quantile of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.compact.95percentile
(gauge)
[DogStatsD] The p95 of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.avg
(gauge)
[DogStatsD] The avg of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.serf.snapshot.compact
consul.serf.snapshot.compact.max
(gauge)
[DogStatsD] The max of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.median
(gauge)
[DogStatsD] The median of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.quantile
(gauge)
[Prometheus] The quantile of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond

Consul Agent가 DogStatsD에 보내는 메트릭에 대한 설명은 Consul의 Telemetry 문서를 참조하세요.

네트워크 지연 시간 메트릭 측정 방법은 Consul의 Network Coordinates 문서를 참조하세요

이벤트

consul.new_leader:
Datadog Agent는 Consul 클러스터가 새 리더를 선출할 때 prev_consul_leader, curr_consul_leader, consul_datacenter로 태그를 지정하여 이벤트를 보냅니다.

서비스 점검

consul.check
Returns OK if the service is up, WARNING if there is an issue and CRITICAL when down.
Statuses: ok, warning, critical, unknown

consul.up
Returns OK if the consul server is up, CRITICAL otherwise.
Statuses: ok, critical

consul.can_connect
Returns OK if the Agent can make HTTP requests to consul, CRITICAL otherwise.
Statuses: ok, critical

consul.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint, otherwise returns OK.
Statuses: ok, critical

트러블슈팅

도움이 필요하신가요? Datadog 지원 팀에 문의하세요.

참고 자료

기타 유용한 문서, 링크 및 기사:

PREVIEWING: rtrieu/product-analytics-ui-changes