Consul

Supported OS Linux Windows Mac OS

Integration version4.0.0

Consul Dash

Overview

The Datadog Agent collects many metrics from Consul nodes, including those for:

  • Total Consul peers
  • Service health - for a given service, how many of its nodes are up, passing, warning, critical?
  • Node health - for a given node, how many of its services are up, passing, warning, critical?
  • Network coordinates - inter- and intra-datacenter latencies

The Consul Agent can provide further metrics with DogStatsD. These metrics are more related to the internal health of Consul itself, not to services which depend on Consul. There are metrics for:

  • Serf events and member flaps
  • The Raft protocol
  • DNS performance

And many more.

Finally, in addition to metrics, the Datadog Agent also sends a service check for each of Consul’s health checks, and an event after each new leader election.

Setup

Installation

The Datadog Agent’s Consul check is included in the Datadog Agent package, so you don’t need to install anything else on your Consul nodes.

Configuration

Host

To configure this check for an Agent running on a host:

Metric collection
  1. Edit the consul.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your Consul metrics. See the sample consul.d/conf.yaml for all available configuration options.

    init_config:
    
    instances:
      ## @param url - string - required
      ## Where your Consul HTTP server lives,
      ## point the URL at the leader to get metrics about your Consul cluster.
      ## Use HTTPS instead of HTTP if your Consul setup is configured to do so.
      #
      - url: http://localhost:8500
    
  2. Restart the Agent.

OpenMetrics

Optionally, you can enable the use_prometheus_endpoint configuration option to get an additional set of metrics from the Consul Prometheus endpoint.

Note: Use the DogStatsD or Prometheus method, do not enable both for the same instance.

  1. Configure Consul to expose metrics to the Prometheus endpoint. Set the prometheus_retention_time nested under the top-level telemetry key of the main Consul configuration file:

    {
      ...
      "telemetry": {
        "prometheus_retention_time": "360h"
      },
      ...
    }
    
  2. Edit the consul.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start using the Prometheus endpoint.

    instances:
        - url: <EXAMPLE>
          use_prometheus_endpoint: true
    
  3. Restart the Agent.

DogStatsD

Instead of using the Prometheus endpoint, you can configure Consul to send the same set of additional metrics to the Agent through DogStatsD.

  1. Configure Consul to send DogStatsD metrics by adding the dogstatsd_addr nested under the top-level telemetry key in the main Consul configuration file:

    {
      ...
      "telemetry": {
        "dogstatsd_addr": "127.0.0.1:8125"
      },
      ...
    }
    
  2. Update the Datadog Agent main configuration file datadog.yaml by adding the following configs to ensure metrics are tagged correctly:

    # dogstatsd_mapper_cache_size: 1000  # default to 1000
    dogstatsd_mapper_profiles:
      - name: consul
        prefix: "consul."
        mappings:
          - match: 'consul\.http\.([a-zA-Z]+)\.(.*)'
            match_type: "regex"
            name: "consul.http.request"
            tags:
              method: "$1"
              path: "$2"
          - match: 'consul\.raft\.replication\.appendEntries\.logs\.([0-9a-f-]+)'
            match_type: "regex"
            name: "consul.raft.replication.appendEntries.logs"
            tags:
              peer_id: "$1"
          - match: 'consul\.raft\.replication\.appendEntries\.rpc\.([0-9a-f-]+)'
            match_type: "regex"
            name: "consul.raft.replication.appendEntries.rpc"
            tags:
              peer_id: "$1"
          - match: 'consul\.raft\.replication\.heartbeat\.([0-9a-f-]+)'
            match_type: "regex"
            name: "consul.raft.replication.heartbeat"
            tags:
              peer_id: "$1"
    
  3. Restart the Agent.

Log collection

Available for Agent versions >6.0

  1. Collecting logs is disabled by default in the Datadog Agent, enable it in datadog.yaml with:

    logs_enabled: true
    
  2. Edit this configuration block in your consul.yaml file to collect Consul logs:

    logs:
      - type: file
        path: /var/log/consul_server.log
        source: consul
        service: myservice
    

    Change the path and service parameter values and configure them for your environment. See the sample consul.d/conf.yaml for all available configuration options.

  3. Restart the Agent.

Containerized

For containerized environments, see the Autodiscovery Integration Templates for guidance on applying the parameters below.

Metric collection
ParameterValue
<INTEGRATION_NAME>consul
<INIT_CONFIG>blank or {}
<INSTANCE_CONFIG>{"url": "https://%%host%%:8500"}
Log collection

Available for Agent versions >6.0

Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.

ParameterValue
<LOG_CONFIG>{"source": "consul", "service": "<SERVICE_NAME>"}

Validation

Run the Agent’s status subcommand and look for consul under the Checks section.

Note: If your Consul nodes have debug logging enabled, the Datadog Agent’s regular polling shows in the Consul log:

2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/status/leader (59.344us) from=127.0.0.1:53768
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/status/peers (62.678us) from=127.0.0.1:53770
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/health/state/any (106.725us) from=127.0.0.1:53772
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/catalog/services (79.657us) from=127.0.0.1:53774
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/health/service/consul (153.917us) from=127.0.0.1:53776
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/coordinate/datacenters (71.778us) from=127.0.0.1:53778
2017/03/27 21:38:12 [DEBUG] http: Request GET /v1/coordinate/nodes (84.95us) from=127.0.0.1:53780

Consul Agent to DogStatsD

Use netstat to verify that Consul is sending its metrics, too:

$ sudo netstat -nup | grep "127.0.0.1:8125.*ESTABLISHED"
udp        0      0 127.0.0.1:53874         127.0.0.1:8125          ESTABLISHED 23176/consul

Data Collected

Metrics

consul.catalog.nodes_critical
(gauge)
[Integration] The number of nodes with service status critical from those registered
Shown as node
consul.catalog.nodes_passing
(gauge)
[Integration] The number of nodes with service status passing from those registered
Shown as node
consul.catalog.nodes_up
(gauge)
[Integration] The number of nodes
Shown as node
consul.catalog.nodes_warning
(gauge)
[Integration] The number of nodes with service status warning from those registered
Shown as node
consul.catalog.services_count
(gauge)
[Integration] Metrics to count the number of services matching criteria like the service tag, node name, or status. To be queried using the sum by aggregator.
Shown as service
consul.catalog.services_critical
(gauge)
[Integration] Total critical services on nodes
Shown as service
consul.catalog.services_passing
(gauge)
[Integration] Total passing services on nodes
Shown as service
consul.catalog.services_up
(gauge)
[Integration] Total services registered on nodes
Shown as service
consul.catalog.services_warning
(gauge)
[Integration] Total warning services on nodes
Shown as service
consul.catalog.total_nodes
(gauge)
[Integration] The number of nodes registered in the consul cluster
Shown as node
consul.client.rpc
(count)
[DogStatsD] [Prometheus] This increments whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers.
Shown as request
consul.client.rpc.failed
(count)
[DogStatsD] [Prometheus] Increments whenever a Consul agent in client mode makes an RPC request to a Consul server and fails
Shown as request
consul.http.request
(gauge)
[DogStatsD] Tracks how long it takes to service the given HTTP request for the given verb and path. Using a DogStatsD mapper as described in the README, the paths are mapped to tags and do not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: http_method:GET, path:v1.kv._)
Shown as millisecond
consul.http.request.count
(count)
[Prometheus] A count of how long it takes to service the given HTTP request for the given verb and path. It includes labels for path and method. Path does not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: path=v1.kv._)
Shown as millisecond
consul.http.request.quantile
(gauge)
[Prometheus] A quantile of how long it takes to service the given HTTP request for the given verb and path. Includes labels for path and method. Path does not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: path=v1.kv._)
Shown as millisecond
consul.http.request.sum
(count)
[Prometheus] The sum of how long it takes to service the given HTTP request for the given verb and path. Includes labels for path and method. Path does not include details like service or key names. For these paths, an underscore is present as a placeholder, for example: path=v1.kv._)
Shown as millisecond
consul.memberlist.degraded.probe
(gauge)
[DogStatsD] [Prometheus] This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa.
consul.memberlist.gossip.95percentile
(gauge)
[DogStatsD] The p95 for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.avg
(gauge)
[DogStatsD] The avg for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.memberlist.gossip
consul.memberlist.gossip.max
(gauge)
[DogStatsD] The max for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.median
(gauge)
[DogStatsD] The median for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.quantile
(gauge)
[Prometheus] The quantile for the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.gossip.sum
(count)
[DogStatsD] [Prometheus] The sum of the number of gossips (messages) broadcasted to a set of randomly selected nodes.
Shown as message
consul.memberlist.health.score
(gauge)
[DogStatsD] [Prometheus] This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy". For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf
consul.memberlist.msg.alive
(count)
[DogStatsD] [Prometheus] This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer.
consul.memberlist.msg.dead
(count)
[DogStatsD] [Prometheus] This metric counts the number of times a Consul agent has marked another agent to be a dead node.
Shown as message
consul.memberlist.msg.suspect
(count)
[DogStatsD] [Prometheus] The number of times a Consul agent suspects another as failed while probing during gossip protocol
consul.memberlist.probenode.95percentile
(gauge)
[DogStatsD] The p95 for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.avg
(gauge)
[DogStatsD] The avg for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.memberlist.probenode
consul.memberlist.probenode.max
(gauge)
[DogStatsD] The max for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.median
(gauge)
[DogStatsD] The median for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.quantile
(gauge)
[Prometheus] The quantile for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.probenode.sum
(count)
[DogStatsD] [Prometheus] The sum for the time taken to perform a single round of failure detection on a select Consul agent.
Shown as node
consul.memberlist.pushpullnode.95percentile
(gauge)
[DogStatsD] The p95 for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.avg
(gauge)
[DogStatsD] The avg for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.memberlist.pushpullnode
consul.memberlist.pushpullnode.max
(gauge)
[DogStatsD] The max for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.median
(gauge)
[DogStatsD] The median for the number of Consul agents that have exchanged state with this agent.
Shown as node
consul.memberlist.pushpullnode.quantile
(gauge)
[Prometheus] The quantile for the number of Consul agents that have exchanged state with this agent.
consul.memberlist.pushpullnode.sum
(count)
[DogStatsD] [Prometheus] The sum for the number of Consul agents that have exchanged state with this agent.
consul.memberlist.tcp.accept
(count)
[DogStatsD] [Prometheus] This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection.
Shown as connection
consul.memberlist.tcp.connect
(count)
[DogStatsD] [Prometheus] This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent.
Shown as connection
consul.memberlist.tcp.sent
(count)
[DogStatsD] [Prometheus] This metric measures the total number of bytes sent by a Consul agent through the TCP protocol
Shown as byte
consul.memberlist.udp.received
(count)
[DogStatsD] [Prometheus] This metric measures the total number of bytes sent/received by a Consul agent through the UDP protocol.
Shown as byte
consul.memberlist.udp.sent
(count)
[DogStatsD] [Prometheus] This metric measures the total number of bytes sent/received by a Consul agent through the UDP protocol.
Shown as byte
consul.net.node.latency.max
(gauge)
[Integration] Maximum latency from this node to all others
Shown as millisecond
consul.net.node.latency.median
(gauge)
[Integration] Median latency from this node to all others
Shown as millisecond
consul.net.node.latency.min
(gauge)
[Integration] Minimum latency from this node to all others
Shown as millisecond
consul.net.node.latency.p25
(gauge)
[Integration] P25 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p75
(gauge)
[Integration] P75 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p90
(gauge)
[Integration] P90 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p95
(gauge)
[Integration] P95 latency from this node to all others
Shown as millisecond
consul.net.node.latency.p99
(gauge)
[Integration] P99 latency from this node to all others
Shown as millisecond
consul.peers
(gauge)
[Integration] The number of peers in the peer set
consul.raft.apply
(count)
[DogStatsD] [Prometheus] The number of raft transactions occurring
Shown as transaction
consul.raft.commitTime.95percentile
(gauge)
[DogStatsD] The p95 time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.avg
(gauge)
[DogStatsD] The average time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.count
(count)
[DogStatsD] [Prometheus] The number of samples of raft.commitTime
consul.raft.commitTime.max
(gauge)
[DogStatsD] The max time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.median
(gauge)
[DogStatsD] The median time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.quantile
(gauge)
[Prometheus] The quantile time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.commitTime.sum
(count)
[DogStatsD] [Prometheus] The sum of the time it takes to commit a new entry to the raft log on the leader
Shown as millisecond
consul.raft.leader.dispatchLog.95percentile
(gauge)
[DogStatsD] The p95 time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.avg
(gauge)
[DogStatsD] The average time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.count
(count)
[DogStatsD] [Prometheus] The number of samples of raft.leader.dispatchLog
consul.raft.leader.dispatchLog.max
(gauge)
[DogStatsD] The max time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.median
(gauge)
[DogStatsD] The median time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.quantile
(gauge)
[Prometheus] The quantile time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.dispatchLog.sum
(count)
[DogStatsD] [Prometheus] The sum of the time it takes for the leader to write log entries to disk
Shown as millisecond
consul.raft.leader.lastContact.95percentile
(gauge)
[DogStatsD] The p95 time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.avg
(gauge)
[DogStatsD] The average time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.count
(count)
[DogStatsD] [Prometheus] The number of samples of raft.leader.lastContact
consul.raft.leader.lastContact.max
(gauge)
[DogStatsD] The max time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.median
(gauge)
[DogStatsD] The median time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.quantile
(gauge)
[Prometheus] The quantile time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.leader.lastContact.sum
(count)
[DogStatsD] [Prometheus] The sum of the time elapsed since the leader was last able to check its lease with followers
Shown as millisecond
consul.raft.replication.appendEntries.logs
(count)
[DogStatsD] [Prometheus] Measures the number of logs replicated to an agent, to bring it up to speed with the leader's logs.
Shown as entry
consul.raft.replication.appendEntries.rpc.count
(count)
[DogStatsD] [Prometheus] The count the time taken by the append entries RFC to replicate the log entries of a leader agent onto its follower agent(s)
Shown as millisecond
consul.raft.replication.appendEntries.rpc.quantile
(gauge)
[Prometheus] The quantile of the time taken by the append entries RFC to replicate the log entries of a leader agent onto its follower agent(s)
Shown as millisecond
consul.raft.replication.appendEntries.rpc.sum
(count)
[DogStatsD] [Prometheus] The sum the time taken by the append entries RFC to replicate the log entries of a leader agent onto its follower agent(s)
Shown as millisecond
consul.raft.replication.heartbeat.count
(count)
[DogStatsD] [Prometheus] The count the time taken to invoke appendEntries on a peer.
Shown as millisecond
consul.raft.replication.heartbeat.quantile
(gauge)
[Prometheus] The quantile of the time taken to invoke appendEntries on a peer.
Shown as millisecond
consul.raft.replication.heartbeat.sum
(count)
[DogStatsD] [Prometheus] The sum of the time taken to invoke appendEntries on a peer.
Shown as millisecond
consul.raft.state.candidate
(count)
[DogStatsD] [Prometheus]The number of initiated leader elections
Shown as event
consul.raft.state.leader
(count)
[DogStatsD] [Prometheus] The number of completed leader elections
Shown as event
consul.runtime.gc_pause_ns.95percentile
(gauge)
[DogStatsD] The p95 for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.avg
(gauge)
[DogStatsD] The avg for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.runtime.gcpausens
consul.runtime.gc_pause_ns.max
(gauge)
[DogStatsD] The max for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.median
(gauge)
[DogStatsD] The median for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.quantile
(gauge)
[Prometheus] The quantile of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.runtime.gc_pause_ns.sum
(count)
[DogStatsD] [Prometheus] The sum of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started.
Shown as nanosecond
consul.serf.coordinate.adjustment_ms.95percentile
(gauge)
[DogStatsD] The p95 in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.avg
(gauge)
[DogStatsD] The avg in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.serf.coordinate.adjustment_ms
consul.serf.coordinate.adjustment_ms.max
(gauge)
[DogStatsD] The max in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.median
(gauge)
[DogStatsD] The median in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.quantile
(gauge)
[Prometheus] The quantile in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.coordinate.adjustment_ms.sum
(count)
[DogStatsD] [Prometheus] The sum in milliseconds for the node coordinate adjustment
Shown as millisecond
consul.serf.events
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent processes a serf event
Shown as event
consul.serf.member.failed
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports.
consul.serf.member.flap
(count)
[DogStatsD] [Prometheus] The number of times a Consul agent is marked dead and then quickly recovers
consul.serf.member.join
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent processes a join event
Shown as event
consul.serf.member.left
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent leaves the cluster.
consul.serf.member.update
(count)
[DogStatsD] [Prometheus] This increments when a Consul agent updates.
consul.serf.msgs.received.95percentile
(gauge)
[DogStatsD] The p95 for the number of serf messages received
Shown as message
consul.serf.msgs.received.avg
(gauge)
[DogStatsD] The avg for the number of serf messages received
Shown as message
consul.serf.msgs.received.count
(count)
[DogStatsD] [Prometheus] The count of serf messages received
consul.serf.msgs.received.max
(gauge)
[DogStatsD] The max for the number of serf messages received
Shown as message
consul.serf.msgs.received.median
(gauge)
[DogStatsD] The median for the number of serf messages received
Shown as message
consul.serf.msgs.received.quantile
(gauge)
[Prometheus] The quantile for the number of serf messages received
Shown as message
consul.serf.msgs.received.sum
(count)
[DogStatsD] [Prometheus] The sum for the number of serf messages received
Shown as message
consul.serf.msgs.sent.95percentile
(gauge)
[DogStatsD] The p95 for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.avg
(gauge)
[DogStatsD] The avg for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.count
(count)
[DogStatsD] [Prometheus] The count of serf messages sent
consul.serf.msgs.sent.max
(gauge)
[DogStatsD] The max for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.median
(gauge)
[DogStatsD] The median for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.quantile
(gauge)
[Prometheus] The quantile for the number of serf messages sent
Shown as message
consul.serf.msgs.sent.sum
(count)
[DogStatsD] [Prometheus] The sum of the number of serf messages sent
Shown as message
consul.serf.queue.event.95percentile
(gauge)
[DogStatsD] The p95 for the size of the serf event queue
consul.serf.queue.event.avg
(gauge)
[DogStatsD] The avg size of the serf event queue
consul.serf.queue.event.count
(count)
[DogStatsD] [Prometheus] The number of items in the serf event queue
consul.serf.queue.event.max
(gauge)
[DogStatsD] The max size of the serf event queue
consul.serf.queue.event.median
(gauge)
[DogStatsD] The median size of the serf event queue
consul.serf.queue.event.quantile
(gauge)
[Prometheus] The quantile for the size of the serf event queue
consul.serf.queue.intent.95percentile
(gauge)
[DogStatsD] The p95 for the size of the serf intent queue
consul.serf.queue.intent.avg
(gauge)
[DogStatsD] The avg size of the serf intent queue
consul.serf.queue.intent.count
(count)
[DogStatsD] [Prometheus] The number of items in the serf intent queue
consul.serf.queue.intent.max
(gauge)
[DogStatsD] The max size of the serf intent queue
consul.serf.queue.intent.median
(gauge)
[DogStatsD] The median size of the serf intent queue
consul.serf.queue.intent.quantile
(gauge)
[Prometheus] The quantile for the size of the serf intent queue
consul.serf.queue.query.95percentile
(gauge)
[DogStatsD] The p95 for the size of the serf query queue
consul.serf.queue.query.avg
(gauge)
[DogStatsD] The avg size of the serf query queue
consul.serf.queue.query.count
(count)
[DogStatsD] [Prometheus] The number of items in the serf query queue
consul.serf.queue.query.max
(gauge)
[DogStatsD] The max size of the serf query queue
consul.serf.queue.query.median
(gauge)
[DogStatsD] The median size of the serf query queue
consul.serf.queue.query.quantile
(gauge)
[Prometheus] The quantile for the size of the serf query queue
consul.serf.snapshot.appendline.95percentile
(gauge)
[DogStatsD] The p95 of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.avg
(gauge)
[DogStatsD] The avg of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.serf.snapshot.appendline
consul.serf.snapshot.appendline.max
(gauge)
[DogStatsD] The max of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.median
(gauge)
[DogStatsD] The median of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.appendline.quantile
(gauge)
[Prometheus] The quantile of the time taken by the Consul agent to append an entry into the existing log.
Shown as millisecond
consul.serf.snapshot.compact.95percentile
(gauge)
[DogStatsD] The p95 of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.avg
(gauge)
[DogStatsD] The avg of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.count
(count)
[DogStatsD] [Prometheus] The number of samples of consul.serf.snapshot.compact
consul.serf.snapshot.compact.max
(gauge)
[DogStatsD] The max of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.median
(gauge)
[DogStatsD] The median of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond
consul.serf.snapshot.compact.quantile
(gauge)
[Prometheus] The quantile of the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction .
Shown as millisecond

See Consul’s Telemetry doc for a description of metrics the Consul Agent sends to DogStatsD.

See Consul’s Network Coordinates doc for details on how the network latency metrics are calculated.

Events

consul.new_leader:
The Datadog Agent emits an event when the Consul cluster elects a new leader, tagging it with prev_consul_leader, curr_consul_leader, and consul_datacenter.

Service Checks

consul.check
Returns OK if the service is up, WARNING if there is an issue and CRITICAL when down.
Statuses: ok, warning, critical, unknown

consul.up
Returns OK if the consul server is up, CRITICAL otherwise.
Statuses: ok, critical

consul.can_connect
Returns OK if the Agent can make HTTP requests to consul, CRITICAL otherwise.
Statuses: ok, critical

consul.prometheus.health
Returns CRITICAL if the check cannot access the metrics endpoint, otherwise returns OK.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.

Further Reading

Additional helpful documentation, links, and articles:

PREVIEWING: safchain/fix-custom-agent