Confluent Platform

Supported OS Linux Windows Mac OS

Integration version1.10.2

Overview

This check monitors Confluent Platform and Kafka components through the Datadog Agent.

This integration collects JMX metrics for the following components:

  • Broker
  • Connect
  • Replicator
  • Schema Registry
  • ksqlDB Server
  • Streams
  • REST Proxy

Setup

Installation

The Confluent Platform check is included in the Datadog Agent package. No additional installation is needed on your Confluent Platform component server.

Note: This check collects metrics with JMX. A JVM is required on each node so the Agent can run jmxfetch. It is recommended to use an Oracle-provided JVM.

Configuration

  1. Edit the confluent_platform.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to collect your Confluent Platform performance data. See the sample confluent_platform.d/conf.yaml for all available configuration options.

    For each component, you need to create a separate instance to collect its JMX metrics. The list of default metrics collected are listed in metrics.yaml file, for example:

    instances:
     - host: localhost
       port: 8686
       name: broker_instance
       user: username
       password: password
     - host: localhost
       port: 8687
       name: schema_registry_instance
     - host: localhost
       port: 8688
       name: rest_proxy_instance
    
  2. Restart the Agent.

Log collection

Available for Agent versions >6.0

  1. Collecting logs is disabled by default in the Datadog Agent, you need to enable it in datadog.yaml:

    logs_enabled: true
    
  2. Add this configuration block to your confluent_platform.d/conf.yaml file to start collecting your Confluent Platform components logs:

      logs:
        - type: file
          path: <CONFLUENT_COMPONENT_PATH>/logs/*.log
          source: confluent_platform
          service: <SERVICE_NAME>
          log_processing_rules:
            - type: multi_line
              name: new_log_start_with_date
              pattern: \[\d{4}\-\d{2}\-\d{2}
    

    Change the path and service parameter values and configure them for your environment. See the sample confluent_platform.d/conf.yaml for all available configuration options.

  3. Restart the Agent.

Metric collection

For containerized environments, see the Autodiscovery with JMX guide.

Validation

Run the Agent’s status subcommand and look for confluent_platform under the JMXFetch section.

    ========
    JMXFetch
    ========

      Initialized checks
      ==================
        confluent_platform
          instance_name : confluent_platform-localhost-31006
          message :
          metric_count : 26
          service_check_count : 0
          status : OK

Data Collected

Metrics

confluent.kafka.cluster.partition.under_min_isr
(gauge)
Number of partitions whose in-sync replicas count is less than minIsr. These partitions will be unavailable to producers who use acks=all.
confluent.kafka.connect.connect_metrics.failed_authentication_rate
(gauge)
Failed authentication rate
confluent.kafka.connect.connect_metrics.failed_authentication_total
(gauge)
failed authentication total
confluent.kafka.connect.connect_metrics.incoming_byte_rate
(gauge)
Incoming byte rate
Shown as byte
confluent.kafka.connect.connect_metrics.outgoing_byte_rate
(gauge)
Outgoing byte_rate
Shown as byte
confluent.kafka.connect.connect_metrics.successful_authentication_rate
(gauge)
Successful authentication rate
confluent.kafka.connect.connect_metrics.successful_authentication_total
(gauge)
successful authentication total
confluent.kafka.connect.connector_metrics.status
(gauge)
Status of connector
confluent.kafka.connect.connector_task.batch_size_avg
(gauge)
The average size of the batches processed by the connector.
confluent.kafka.connect.connector_task.batch_size_max
(gauge)
The maximum size of the batches processed by the connector.
confluent.kafka.connect.connector_task.offset_commit_avg_time_ms
(gauge)
The average time in milliseconds taken by this task to commit offsets.
Shown as millisecond
confluent.kafka.connect.connector_task.offset_commit_failure_percentage
(gauge)
The average percentage of this task's offset commit attempts that failed.
Shown as percent
confluent.kafka.connect.connector_task.offset_commit_max_time_ms
(gauge)
The maximum time in milliseconds taken by this task to commit offsets.
Shown as millisecond
confluent.kafka.connect.connector_task.offset_commit_success_percentage
(gauge)
The average percentage of this task's offset commit attempts that succeeded.
Shown as percent
confluent.kafka.connect.connector_task.pause_ratio
(gauge)
The fraction of time this task has spent in the pause state.
Shown as fraction
confluent.kafka.connect.connector_task.running_ratio
(gauge)
The fraction of time this task has spent in the running state.
Shown as fraction
confluent.kafka.connect.sink_task.offset_commit_completion_rate
(gauge)
The average per-second number of offset commit completions that were completed successfully.
Shown as commit
confluent.kafka.connect.sink_task.offset_commit_completion_total
(gauge)
The total number of offset commit completions that were completed successfully.
Shown as commit
confluent.kafka.connect.sink_task.offset_commit_seq_no
(gauge)
The current sequence number for offset commits.
confluent.kafka.connect.sink_task.offset_commit_skip_rate
(gauge)
The average per-second number of offset commit completions that were received too late and skipped/ignored.
Shown as commit
confluent.kafka.connect.sink_task.offset_commit_skip_total
(gauge)
The total number of offset commit completions that were received too late and skipped/ignored.
Shown as commit
confluent.kafka.connect.sink_task.partition_count
(gauge)
The number of topic partitions assigned to this task belonging to the named sink connector in this worker.
confluent.kafka.connect.sink_task.put_batch_avg_time_ms
(gauge)
The average time taken by this task to put a batch of sinks records.
Shown as millisecond
confluent.kafka.connect.sink_task.put_batch_max_time_ms
(gauge)
The maximum time taken by this task to put a batch of sinks records.
Shown as millisecond
confluent.kafka.connect.sink_task.sink_record_active_count
(gauge)
The number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
Shown as record
confluent.kafka.connect.sink_task.sink_record_active_count_avg
(gauge)
The average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
Shown as record
confluent.kafka.connect.sink_task.sink_record_active_count_max
(gauge)
The maximum number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
Shown as record
confluent.kafka.connect.sink_task.sink_record_lag_max
(gauge)
The maximum lag in terms of number of records that the sink task is behind the consumer's position for any topic partitions.
Shown as record
confluent.kafka.connect.sink_task.sink_record_read_rate
(gauge)
The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.
Shown as record
confluent.kafka.connect.sink_task.sink_record_read_total
(gauge)
The total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted.
Shown as record
confluent.kafka.connect.sink_task.sink_record_send_rate
(gauge)
The average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
Shown as record
confluent.kafka.connect.sink_task.sink_record_send_total
(gauge)
The total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted.
Shown as record
confluent.kafka.connect.source_task.poll_batch_avg_time_ms
(gauge)
The average time in milliseconds taken by this task to poll for a batch of source records.
Shown as millisecond
confluent.kafka.connect.source_task.poll_batch_max_time_ms
(gauge)
The maximum time in milliseconds taken by this task to poll for a batch of source records.
Shown as millisecond
confluent.kafka.connect.source_task.source_record_active_count
(gauge)
The number of records that have been produced by this task but not yet completely written to Kafka.
Shown as record
confluent.kafka.connect.source_task.source_record_active_count_avg
(gauge)
The average number of records that have been produced by this task but not yet completely written to Kafka.
Shown as record
confluent.kafka.connect.source_task.source_record_active_count_max
(gauge)
The maximum number of records that have been produced by this task but not yet completely written to Kafka.
Shown as record
confluent.kafka.connect.source_task.source_record_poll_rate
(gauge)
The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
Shown as record
confluent.kafka.connect.source_task.source_record_poll_total
(gauge)
The total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
Shown as record
confluent.kafka.connect.source_task.source_record_write_rate
(gauge)
The average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
Shown as record
confluent.kafka.connect.source_task.source_record_write_total
(gauge)
The number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted.
Shown as record
confluent.kafka.connect.task_error.deadletterqueue_produce_failures
(gauge)
The number of failed writes to the dead letter queue.
confluent.kafka.connect.task_error.deadletterqueue_produce_requests
(gauge)
The number of attempted writes to the dead letter queue.
confluent.kafka.connect.task_error.last_error_timestamp
(gauge)
The epoch timestamp when this task last encountered an error in millisecond.
Shown as millisecond
confluent.kafka.connect.task_error.total_errors_logged
(gauge)
The number of errors that were logged.
Shown as error
confluent.kafka.connect.task_error.total_record_errors
(gauge)
The number of record processing errors in this task.
Shown as record
confluent.kafka.connect.task_error.total_record_failures
(gauge)
The number of record processing failures in this task.
Shown as record
confluent.kafka.connect.task_error.total_records_skipped
(gauge)
The number of records skipped due to errors.
Shown as record
confluent.kafka.connect.task_error.total_retries
(gauge)
The number of operations retried.
Shown as operation
confluent.kafka.connect.worker.connector_count
(gauge)
The number of connectors run in this worker.
Shown as record
confluent.kafka.connect.worker.connector_destroyed_task_count
(gauge)
The number of destroyed tasks of the connector on the worker.
Shown as task
confluent.kafka.connect.worker.connector_failed_task_count
(gauge)
The number of failed tasks of the connector on the worker.
Shown as task
confluent.kafka.connect.worker.connector_paused_task_count
(gauge)
The number of paused tasks of the connector on the worker.
Shown as task
confluent.kafka.connect.worker.connector_running_task_count
(gauge)
The number of running tasks of the connector on the worker.
Shown as task
confluent.kafka.connect.worker.connector_startup_attempts_total
(gauge)
The total number of connector startups that this worker has attempted.
confluent.kafka.connect.worker.connector_startup_failure_percentage
(gauge)
The average percentage of this worker's connectors starts that failed.
Shown as percent
confluent.kafka.connect.worker.connector_startup_failure_total
(gauge)
The total number of connector starts that failed.
confluent.kafka.connect.worker.connector_startup_success_percentage
(gauge)
The average percentage of this worker's connectors starts that succeeded.
Shown as percent
confluent.kafka.connect.worker.connector_startup_success_total
(gauge)
The total number of connector starts that succeeded.
confluent.kafka.connect.worker.connector_total_task_count
(gauge)
The number of tasks of the connector on the worker.
Shown as task
confluent.kafka.connect.worker.connector_unassigned_task_count
(gauge)
The number of unassigned tasks of the connector on the worker.
Shown as task
confluent.kafka.connect.worker.task_count
(gauge)
The number of tasks run in this worker.
Shown as task
confluent.kafka.connect.worker.task_startup_attempts_total
(gauge)
The total number of task startups that this worker has attempted.
confluent.kafka.connect.worker.task_startup_failure_percentage
(gauge)
The average percentage of this worker's tasks starts that failed.
Shown as percent
confluent.kafka.connect.worker.task_startup_failure_total
(gauge)
The total number of task starts that failed.
confluent.kafka.connect.worker.task_startup_success_percentage
(gauge)
The average percentage of this worker's tasks starts that succeeded.
Shown as percent
confluent.kafka.connect.worker.task_startup_success_total
(gauge)
The total number of task starts that succeeded.
confluent.kafka.connect.worker_rebalance.completed_rebalances_total
(gauge)
The total number of rebalances completed by this worker.
confluent.kafka.connect.worker_rebalance.epoch
(gauge)
The epoch or generation number of this worker.
confluent.kafka.connect.worker_rebalance.rebalance_avg_time_ms
(gauge)
The average time in milliseconds spent by this worker to rebalance.
Shown as millisecond
confluent.kafka.connect.worker_rebalance.rebalance_max_time_ms
(gauge)
The maximum time in milliseconds spent by this worker to rebalance.
Shown as millisecond
confluent.kafka.connect.worker_rebalance.rebalancing
(gauge)
Whether this worker is currently rebalancing.
confluent.kafka.connect.worker_rebalance.time_since_last_rebalance_ms
(gauge)
The time in milliseconds since this worker completed the most recent rebalance.
Shown as millisecond
confluent.kafka.consumer.bytes_consumed_rate
(gauge)
Indicates throughput of Replicator reading events from origin cluster.
Shown as byte
confluent.kafka.consumer.connection_count
(gauge)
The current number of active connections on the consumer.
Shown as connection
confluent.kafka.consumer.fetch.bytes_consumed_rate
(gauge)
The average number of bytes consumed per second.
Shown as byte
confluent.kafka.consumer.fetch.fetch_latency_avg
(gauge)
The average time taken for a fetch request.
Shown as millisecond
confluent.kafka.consumer.fetch.fetch_latency_max
(gauge)
The max time taken for a fetch request.
Shown as millisecond
confluent.kafka.consumer.fetch.fetch_rate
(gauge)
The number of fetch requests per second.
Shown as request
confluent.kafka.consumer.fetch.fetch_size_avg
(gauge)
The average number of bytes fetched per fetch request.
Shown as byte
confluent.kafka.consumer.fetch.fetch_size_max
(gauge)
The maximum number of bytes fetched per fetch request.
Shown as byte
confluent.kafka.consumer.fetch.fetch_throttle_time_avg
(gauge)
The average throttle time in ms. When quotas are enabled, the broker may delay fetch requests in order to throttle a consumer which has exceeded its limit. This metric indicates how throttling time has been added to fetch requests on average.
Shown as millisecond
confluent.kafka.consumer.fetch.fetch_throttle_time_max
(gauge)
The maximum throttle time in ms.
Shown as millisecond
confluent.kafka.consumer.fetch.records_consumed_rate
(gauge)
The average number of records consumed per second.
Shown as record
confluent.kafka.consumer.fetch.records_lag_max
(gauge)
The maximum lag in terms of number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers.
Shown as record
confluent.kafka.consumer.fetch.records_per_request_avg
(gauge)
The average number of records in each request.
Shown as record
confluent.kafka.consumer.fetch_rate
(gauge)
The number of fetches per second.
Shown as request
confluent.kafka.consumer.fetch_size_avg
(gauge)
The average number of bytes fetched per request
Shown as byte
confluent.kafka.consumer.fetch_size_max
(gauge)
The maximum number of bytes fetched per request.
Shown as byte
confluent.kafka.consumer.fetch_throttle_time_avg
(gauge)
Fetch requests may be throttled to meet quotas configured on the origin cluster. If this average is non-zero, it indicates that the origin brokers are slowing the consumer down and the quotas configuration should be reviewed. For more information on quotas see Enforcing Client Quotas
Shown as millisecond
confluent.kafka.consumer.fetch_throttle_time_max
(gauge)
Fetch requests may be throttled to meet quotas configured on the origin cluster. If this maximum is non-zero, it indicates that the origin brokers are slowing the consumer down and the quotas configuration should be reviewed. For more information on quotas see Enforcing Client Quotas
Shown as millisecond
confluent.kafka.consumer.fetch_topic.bytes_consumed_rate
(gauge)
The average number of bytes consumed per second for a specific topic.
Shown as byte
confluent.kafka.consumer.fetch_topic.fetch_size_avg
(gauge)
The average number of bytes fetched per request for a specific topic.
Shown as byte
confluent.kafka.consumer.fetch_topic.fetch_size_max
(gauge)
The maximum number of bytes fetched per request for a specific topic.
Shown as byte
confluent.kafka.consumer.fetch_topic.records_consumed_rate
(gauge)
The average number of records consumed per second for a specific topic.
Shown as record
confluent.kafka.consumer.fetch_topic.records_per_request_avg
(gauge)
The average number of records in each request for a specific topic.
Shown as record
confluent.kafka.consumer.group.assigned_partitions
(gauge)
The number of partitions currently assigned to this consumer.
Shown as unit
confluent.kafka.consumer.group.commit_latency_avg
(gauge)
The average time taken for a commit request.
Shown as millisecond
confluent.kafka.consumer.group.commit_latency_max
(gauge)
The max time taken for a commit request.
Shown as millisecond
confluent.kafka.consumer.group.commit_rate
(gauge)
The number of commit calls per second
Shown as commit
confluent.kafka.consumer.group.heartbeat_rate
(gauge)
The average number of heartbeats per second.
Shown as operation
confluent.kafka.consumer.group.heartbeat_response_time_max
(gauge)
The max time taken to receive a response to a heartbeat request.
Shown as millisecond
confluent.kafka.consumer.group.join_rate
(gauge)
The number of group joins per second. Group joining is the first phase of the rebalance protocol. A large value indicates that the consumer group is unstable and will likely be coupled with increased lag.
Shown as operation
confluent.kafka.consumer.group.join_time_avg
(gauge)
The average time taken for a group rejoin.
Shown as millisecond
confluent.kafka.consumer.group.join_time_max
(gauge)
The max time taken for a group rejoin. This value should not get much higher than the configured session timeout for the consumer.
Shown as millisecond
confluent.kafka.consumer.group.last_heartbeat_seconds_ago
(gauge)
The number of seconds since the last controller heartbeat.
Shown as second
confluent.kafka.consumer.group.sync_rate
(gauge)
The number of group syncs per second. Group synchronization is the second and last phase of the rebalance protocol.
Shown as operation
confluent.kafka.consumer.group.sync_time_avg
(gauge)
The average time taken for a group sync.
Shown as millisecond
confluent.kafka.consumer.group.sync_time_max
(gauge)
The max time taken for a group sync.
Shown as millisecond
confluent.kafka.consumer.io_ratio
(gauge)
The fraction of time the consumer I/O thread spent doing I/O
Shown as fraction
confluent.kafka.consumer.io_wait_ratio
(gauge)
The fraction of time the consumer I/O thread spent waiting
Shown as fraction
confluent.kafka.consumer.network_io_rate
(gauge)
The number of network operations (reads or writes) on all consumer connections per second
Shown as connection
confluent.kafka.consumer.records_lag_max
(gauge)
The maximum lag in terms of number of records for any partition. An increasing value over time indicates that Replicator is not keeping up with the rate at which events are written to the origin cluster.
Shown as record
confluent.kafka.consumer.request_latency_avg
(gauge)
The average consumer request latency in ms
Shown as millisecond
confluent.kafka.consumer.request_rate
(gauge)
The number of requests sent per second by a consumer
Shown as request
confluent.kafka.consumer.response_rate
(gauge)
The number of responses received per second by a consumer
Shown as response
confluent.kafka.controller.active_controller_count
(gauge)
Number of active controllers in the cluster. Alert if the aggregated sum across all brokers in the cluster is anything other than 1 because there should be exactly one controller per cluster.
confluent.kafka.controller.global_partition_count
(gauge)
global partition count
confluent.kafka.controller.global_topic_count
(gauge)
global topic count
confluent.kafka.controller.global_under_min_isr_partition_count
(gauge)
under min isr count
confluent.kafka.controller.leader_election_rate_and_time_ms.avg
(gauge)
Leader election rate avg.
Shown as millisecond
confluent.kafka.controller.leader_election_rate_and_time_ms.rate
(gauge)
Leader election rate.
Shown as millisecond
confluent.kafka.controller.offline_partitions_count
(gauge)
Number of partitions that don't have an active leader and are hence not writable or readable. Alert if value is greater than 0.
confluent.kafka.controller.preferred_replica_imbalance_count
(gauge)
Preferred Replica Imbalance Count
confluent.kafka.controller.unclean_leader_elections_per_sec.avg
(gauge)
Unclean leader election rate avg.
Shown as unit
confluent.kafka.controller.unclean_leader_elections_per_sec.rate
(gauge)
Unclean leader election rate.
Shown as unit
confluent.kafka.log.log_flush_rate_and_time_ms.avg
(gauge)
Log flush rate avg.
Shown as millisecond
confluent.kafka.log.log_flush_rate_and_time_ms.rate
(gauge)
Log flush rate.
Shown as millisecond
confluent.kafka.log.size
(gauge)
log size per topic
Shown as byte
confluent.kafka.network.request.local_time_ms.50percentile
(gauge)
Time the request is processed at the leader (50percentile).
Shown as millisecond
confluent.kafka.network.request.local_time_ms.75percentile
(gauge)
Time the request is processed at the leader (75percentile).
Shown as millisecond
confluent.kafka.network.request.local_time_ms.95percentile
(gauge)
Time the request is processed at the leader (95percentile).
Shown as millisecond
confluent.kafka.network.request.local_time_ms.98percentile
(gauge)
Time the request is processed at the leader (98percentile).
Shown as millisecond
confluent.kafka.network.request.local_time_ms.999percentile
(gauge)
Time the request is processed at the leader (999percentile).
Shown as millisecond
confluent.kafka.network.request.local_time_ms.99percentile
(gauge)
Time the request is processed at the leader (99percentile).
Shown as millisecond
confluent.kafka.network.request.local_time_ms.avg
(gauge)
Time the request is processed at the leader (avg).
Shown as millisecond
confluent.kafka.network.request.local_time_ms.rate
(gauge)
Time the request is processed at the leader (rate).
Shown as millisecond
confluent.kafka.network.request.remote_time_ms.50percentile
(gauge)
Time the request waits for the follower. This is non-zero for produce requests when acks=all (50percentile).
Shown as millisecond
confluent.kafka.network.request.remote_time_ms.75percentile
(gauge)
Time the request waits for the follower. This is non-zero for produce requests when acks=all (75percentile).
Shown as millisecond
confluent.kafka.network.request.remote_time_ms.95percentile
(gauge)
Time the request waits for the follower. This is non-zero for produce requests when acks=all (95percentile).
Shown as millisecond
confluent.kafka.network.request.remote_time_ms.98percentile
(gauge)
Time the request waits for the follower. This is non-zero for produce requests when acks=all (98percentile).
Shown as millisecond
confluent.kafka.network.request.remote_time_ms.999percentile
(gauge)
Time the request waits for the follower. This is non-zero for produce requests when acks=all (999percentile).
Shown as millisecond
confluent.kafka.network.request.remote_time_ms.99percentile
(gauge)
Time the request waits for the follower. This is non-zero for produce requests when acks=all (99percentile).
Shown as millisecond
confluent.kafka.network.request.remote_time_ms.avg
(gauge)
Time the request waits for the follower. This is non-zero for produce requests when acks=all (avg).
Shown as millisecond
confluent.kafka.network.request.remote_time_ms.rate
(gauge)
Time the request waits for the follower. This is non-zero for produce requests when acks=all (rate).
Shown as millisecond
confluent.kafka.network.request.request_queue_time_ms.50percentile
(gauge)
Time the request waits in the request queue (50percentile).
Shown as millisecond
confluent.kafka.network.request.request_queue_time_ms.75percentile
(gauge)
Time the request waits in the request queue (75percentile).
Shown as millisecond
confluent.kafka.network.request.request_queue_time_ms.95percentile
(gauge)
Time the request waits in the request queue (95percentile).
Shown as millisecond
confluent.kafka.network.request.request_queue_time_ms.98percentile
(gauge)
Time the request waits in the request queue (98percentile).
Shown as millisecond
confluent.kafka.network.request.request_queue_time_ms.999percentile
(gauge)
Time the request waits in the request queue (999percentile).
Shown as millisecond
confluent.kafka.network.request.request_queue_time_ms.99percentile
(gauge)
Time the request waits in the request queue (99percentile).
Shown as millisecond
confluent.kafka.network.request.request_queue_time_ms.avg
(gauge)
Time the request waits in the request queue (avg).
Shown as millisecond
confluent.kafka.network.request.request_queue_time_ms.rate
(gauge)
Time the request waits in the request queue (rate).
Shown as millisecond
confluent.kafka.network.request.requests_per_sec.rate
(gauge)
Request rate.
Shown as request
confluent.kafka.network.request.response_queue_time_ms.50percentile
(gauge)
Time the request waits in the response queue (50percentile).
Shown as millisecond
confluent.kafka.network.request.response_queue_time_ms.75percentile
(gauge)
Time the request waits in the response queue (75percentile).
Shown as millisecond
confluent.kafka.network.request.response_queue_time_ms.95percentile
(gauge)
Time the request waits in the response queue (95percentile).
Shown as millisecond
confluent.kafka.network.request.response_queue_time_ms.98percentile
(gauge)
Time the request waits in the response queue (98percentile).
Shown as millisecond
confluent.kafka.network.request.response_queue_time_ms.999percentile
(gauge)
Time the request waits in the response queue (999percentile).
Shown as millisecond
confluent.kafka.network.request.response_queue_time_ms.99percentile
(gauge)
Time the request waits in the response queue (99percentile).
Shown as millisecond
confluent.kafka.network.request.response_queue_time_ms.avg
(gauge)
Time the request waits in the response queue (avg).
Shown as millisecond
confluent.kafka.network.request.response_queue_time_ms.rate
(gauge)
Time the request waits in the response queue (rate).
Shown as millisecond
confluent.kafka.network.request.response_send_time_ms.50percentile
(gauge)
Time to send the response (50percentile).
Shown as millisecond
confluent.kafka.network.request.response_send_time_ms.75percentile
(gauge)
Time to send the response (75percentile).
Shown as millisecond
confluent.kafka.network.request.response_send_time_ms.95percentile
(gauge)
Time to send the response (95percentile).
Shown as millisecond
confluent.kafka.network.request.response_send_time_ms.98percentile
(gauge)
Time to send the response (98percentile).
Shown as millisecond
confluent.kafka.network.request.response_send_time_ms.999percentile
(gauge)
Time to send the response (999percentile).
Shown as millisecond
confluent.kafka.network.request.response_send_time_ms.99percentile
(gauge)
Time to send the response (99percentile).
Shown as millisecond
confluent.kafka.network.request.response_send_time_ms.avg
(gauge)
Time to send the response (avg).
Shown as millisecond
confluent.kafka.network.request.response_send_time_ms.rate
(gauge)
Time to send the response (rate).
Shown as millisecond
confluent.kafka.network.request.total_time_ms.50percentile
(gauge)
Total time in ms to serve the specified request (50percentile).
Shown as millisecond
confluent.kafka.network.request.total_time_ms.75percentile
(gauge)
Total time in ms to serve the specified request (75percentile).
Shown as millisecond
confluent.kafka.network.request.total_time_ms.95percentile
(gauge)
Total time in ms to serve the specified request (95percentile).
Shown as millisecond
confluent.kafka.network.request.total_time_ms.98percentile
(gauge)
Total time in ms to serve the specified request (98percentile).
Shown as millisecond
confluent.kafka.network.request.total_time_ms.999percentile
(gauge)
Total time in ms to serve the specified request (999percentile).
Shown as millisecond
confluent.kafka.network.request.total_time_ms.99percentile
(gauge)
Total time in ms to serve the specified request (99percentile).
Shown as millisecond
confluent.kafka.network.request.total_time_ms.avg
(gauge)
Total time in ms to serve the specified request (avg).
Shown as millisecond
confluent.kafka.network.request.total_time_ms.rate
(gauge)
Total time in ms to serve the specified request (rate).
Shown as millisecond
confluent.kafka.network.request_channel.request_queue_size
(gauge)
Size of the request queue. A congested request queue will not be able to process incoming or outgoing requests
confluent.kafka.network.socket_server.network_processor_avg_idle_percent
(gauge)
Average fraction of time the network processor threads are idle
Shown as fraction
confluent.kafka.producer.batch_size_avg
(gauge)
The average number of bytes sent per partition per-request.
Shown as byte
confluent.kafka.producer.batch_size_max
(gauge)
The max number of bytes sent per partition per-request.
Shown as byte
confluent.kafka.producer.bufferpool_wait_time_total
(gauge)
The total time an appender waits for space allocation.
Shown as nanosecond
confluent.kafka.producer.connection_close_rate
(gauge)
Connections closed per second in the window.
Shown as connection
confluent.kafka.producer.connection_count
(gauge)
The current number of active connections on the producer.
Shown as connection
confluent.kafka.producer.connection_creation_rate
(gauge)
New connections established per second in the window.
Shown as connection
confluent.kafka.producer.incoming_byte_rate
(gauge)
The average number of incoming bytes received per second from all servers.
Shown as byte
confluent.kafka.producer.io_ratio
(gauge)
The fraction of time the producer I/O thread spent doing I/O
Shown as fraction
confluent.kafka.producer.io_time_ns_avg
(gauge)
The average length of time for I/O per select call in nanoseconds.
Shown as nanosecond
confluent.kafka.producer.io_wait_ratio
(gauge)
The fraction of time the producer I/O thread spent waiting
Shown as fraction
confluent.kafka.producer.io_wait_time_ns_avg
(gauge)
The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.
Shown as nanosecond
confluent.kafka.producer.network_io_rate
(gauge)
The number of network operations (reads or writes) on all producer connections per second
Shown as operation
confluent.kafka.producer.node.incoming_byte_rate
(gauge)
The average number of bytes received per second from the broker.
Shown as byte
confluent.kafka.producer.node.outgoing_byte_rate
(gauge)
The average number of bytes sent per second to the broker.
Shown as byte
confluent.kafka.producer.node.request_rate
(gauge)
The average number of requests sent per second to the broker.
Shown as request
confluent.kafka.producer.node.request_size_avg
(gauge)
The average size of all requests in the window for a broker.
Shown as request
confluent.kafka.producer.node.request_size_max
(gauge)
The maximum size of any request sent in the window for a broker.
Shown as request
confluent.kafka.producer.node.response_rate
(gauge)
The average number of responses received per second from the broker.
Shown as response
confluent.kafka.producer.outgoing_byte_rate
(gauge)
The number of outgoing bytes sent to all servers per second
Shown as byte
confluent.kafka.producer.produce_throttle_time_avg
(gauge)
The average time in ms a request was throttled by a broker
Shown as millisecond
confluent.kafka.producer.produce_throttle_time_max
(gauge)
The maximum time in ms a request was throttled by a broker
Shown as millisecond
confluent.kafka.producer.record_error_rate
(gauge)
The average per-second number of record sends that resulted in errors
Shown as record
confluent.kafka.producer.record_retry_rate
(gauge)
The average per-second number of retried record sends
Shown as record
confluent.kafka.producer.request_latency_avg
(gauge)
The average producer request latency in ms
Shown as millisecond
confluent.kafka.producer.request_rate
(gauge)
The number of requests sent per second by a producer
Shown as request
confluent.kafka.producer.response_rate
(gauge)
The number of responses received per second by the consumer
Shown as response
confluent.kafka.producer.select_rate
(gauge)
Number of times the I/O layer checked for new I/O to perform per second.
confluent.kafka.producer.topic.byte_rate
(gauge)
The average number of bytes sent per second for a topic.
Shown as byte
confluent.kafka.producer.topic.compression_rate
(gauge)
The average compression rate of record batches for a topic.
confluent.kafka.producer.topic.record_error_rate
(gauge)
The average per-second number of record sends that resulted in errors for a topic.
Shown as record
confluent.kafka.producer.topic.record_retry_rate
(gauge)
The average per-second number of retried record sends for a topic.
Shown as record
confluent.kafka.producer.topic.record_send_rate
(gauge)
The average number of records sent per second for a topic.
Shown as record
confluent.kafka.producer.waiting_threads
(gauge)
The number of user threads blocked waiting for buffer memory to enqueue their records
Shown as thread
confluent.kafka.rest.jersey.brokers.list.request_error_rate
(gauge)
The average rate of failed brokers list HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.assign_v2.request_error_rate
(gauge)
The average rate of failed consumer assign v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.assignment_v2.request_error_rate
(gauge)
The average rate of failed consumer assignment v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.commit.request_error_rate
(gauge)
The average rate of failed consumer commit HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.commit_offsets_v2.request_error_rate
(gauge)
The average rate of failed consumer commit offsets v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.committed_offsets_v2.request_error_rate
(gauge)
The average rate of failed consumer committed offsets v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.create.request_error_rate
(gauge)
The average rate of failed consumer create HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.create_v2.request_error_rate
(gauge)
The average rate of failed consumer create v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.delete.request_error_rate
(gauge)
The average rate of failed consumer delete HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.delete_v2.request_error_rate
(gauge)
The average rate of failed consumer delete v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.records.read_avro_v2.request_error_rate
(gauge)
The average rate of failed consumer records read avro v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.records.read_binary_v2.request_error_rate
(gauge)
The average rate of failed consumer records read binary v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.records.read_json_v2.request_error_rate
(gauge)
The average rate of failed consumer records read json v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.records.read_jsonschema_v2.request_error_rate
(gauge)
The average rate of failed consumer topic read json HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.records.read_protobuf_v2.request_error_rate
(gauge)
The average rate of failed consumer topic read json HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.seek_to_beginning_v2.request_error_rate
(gauge)
The average rate of failed consumer seek to beginning v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.seek_to_end_v2.request_error_rate
(gauge)
The average rate of failed consumer seek to end v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.seek_to_offset_v2.request_error_rate
(gauge)
The average rate of failed consumer seek to offset v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.subscribe_v2.request_error_rate
(gauge)
The average rate of failed consumer subscribe v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.subscription_v2.request_error_rate
(gauge)
The average rate of failed consumer subscription v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.topic.read_avro.request_error_rate
(gauge)
The average rate of failed consumer topic read avro HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.topic.read_binary.request_error_rate
(gauge)
The average rate of failed consumer topic read binary HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.topic.read_json.request_error_rate
(gauge)
The average rate of failed consumer topic read json HTTP requests
Shown as request
confluent.kafka.rest.jersey.consumer.unsubscribe_v2.request_error_rate
(gauge)
The average rate of failed consumer unsubscribe v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.consume_avro.request_error_rate
(gauge)
The average rate of failed partition consume avro HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.consume_binary.request_error_rate
(gauge)
The average rate of failed partition consume binary HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.consume_json.request_error_rate
(gauge)
The average rate of failed partition consume json HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.get.request_error_rate
(gauge)
The average rate of failed partition get HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.get_v2.request_error_rate
(gauge)
The average rate of failed partition get v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.produce_avro.request_error_rate
(gauge)
The average rate of failed partition produce avro HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.produce_avro_v2.request_error_rate
(gauge)
The average rate of failed partition produce avro v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.produce_binary.request_error_rate
(gauge)
The average rate of failed partition produce binary HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.produce_binary_v2.request_error_rate
(gauge)
The average rate of failed partition produce binary v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.produce_json.request_error_rate
(gauge)
The average rate of failed partition produce json HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.produce_json_v2.request_error_rate
(gauge)
The average rate of failed partition produce json v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.produce_jsonschema_v2.request_error_rate
(gauge)
The average rate of failed partition produce json v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.partition.produce_protobuf_v2.request_error_rate
(gauge)
The average rate of failed partition produce json v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.partitions.list.request_error_rate
(gauge)
The average rate of failed partitions list HTTP requests
Shown as request
confluent.kafka.rest.jersey.partitions.list_v2.request_error_rate
(gauge)
The average rate of failed partitions list v2 HTTP requests
Shown as request
confluent.kafka.rest.jersey.request_error_rate
(gauge)
The average rate of failed HTTP requests
Shown as request
confluent.kafka.rest.jersey.root.get.request_error_rate
(gauge)
The average rate of failed root get HTTP requests
Shown as request
confluent.kafka.rest.jersey.root.get_v2.request_error_rate
(gauge)
The average rate of failed root get HTTP requests
Shown as request
confluent.kafka.rest.jersey.root.post.request_error_rate
(gauge)
The average rate of failed root post HTTP requests
Shown as request
confluent.kafka.rest.jersey.root.post_v2.request_error_rate
(gauge)
The average rate of failed root post HTTP requests
Shown as request
confluent.kafka.rest.jersey.topic.get.request_error_rate
(gauge)
The average rate of failed topic get HTTP requests
Shown as request
confluent.kafka.rest.jersey.topic.get_v2.request_error_rate
(gauge)
The average rate of failed topic get HTTP requests
Shown as request
confluent.kafka.rest.jersey.topic.produce_avro.request_error_rate
(gauge)
The average rate of failed topic produce avro HTTP requests
Shown as request
confluent.kafka.rest.jersey.topic.produce_binary.request_error_rate
(gauge)
The average rate of failed topic produce binary HTTP requests
Shown as request
confluent.kafka.rest.jersey.topic.produce_json.request_error_rate
(gauge)
The average rate of failed topic produce json HTTP requests
Shown as request
confluent.kafka.rest.jersey.topics.list.request_error_rate
(gauge)
The average rate of failed topics list HTTP requests
Shown as request
confluent.kafka.rest.jersey.topics.list_v2.request_error_rate
(gauge)
The average rate of failed topics list HTTP requests
Shown as request
confluent.kafka.rest.jetty.connections_active
(gauge)
Total number of active TCP connections (REST).
Shown as connection
confluent.kafka.rest.jetty.connections_closed_rate
(gauge)
The average rate per second of closed TCP connections (REST).
Shown as connection
confluent.kafka.rest.jetty.connections_opened_rate
(gauge)
The average rate per second of opened TCP connections (REST).
Shown as connection
confluent.kafka.schema.registry.avro_schemas_created
(gauge)
schemas created avro
confluent.kafka.schema.registry.avro_schemas_deleted
(gauge)
schemas deleted avro
confluent.kafka.schema.registry.jersey.brokers.list.request_error_rate
(gauge)
The average rate of failed brokers list operations
Shown as request
confluent.kafka.schema.registry.jersey.brokers.list_v2.request_error_rate
(gauge)
The average rate of failed brokers list operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.assign_v2.request_error_rate
(gauge)
The average rate of failed consumer assign v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.assignment_v2.request_error_rate
(gauge)
The average rate of failed consumer assignment v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.commit.request_error_rate
(gauge)
The average rate of failed consumer commit operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.commit_offsets_v2.request_error_rate
(gauge)
The average rate of failed consumer commit offsets v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.committed_offsets_v2.request_error_rate
(gauge)
The average rate of failed consumer committed offsets v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.create.request_error_rate
(gauge)
The average rate of failed consumer create operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.create_v2.request_error_rate
(gauge)
The average rate of failed consumer create v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.delete.request_error_rate
(gauge)
The average rate of failed consumer delete operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.delete_v2.request_error_rate
(gauge)
The average rate of failed consumer delete v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.records.read_avro_v2.request_error_rate
(gauge)
The average rate of failed consumer records read avro v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.records.read_binary_v2.request_error_rate
(gauge)
The average rate of failed consumer records read binary v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.records.read_json_v2.request_error_rate
(gauge)
The average rate of failed consumer records read json v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.seek_to_beginning_v2.request_error_rate
(gauge)
The average rate of failed consumer seek to beginning v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.seek_to_end_v2.request_error_rate
(gauge)
The average rate of failed consumer seek to end v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.seek_to_offset_v2.request_error_rate
(gauge)
The average rate of failed consumer seek to offset v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.subscribe_v2.request_error_rate
(gauge)
The average rate of failed consumer subscribe v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.subscription_v2.request_error_rate
(gauge)
The average rate of failed consumer subscription v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.topic.read_avro.request_error_rate
(gauge)
The average rate of failed consumer topic read avro operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.topic.read_binary.request_error_rate
(gauge)
The average rate of failed consumer topic read binary operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.topic.read_json.request_error_rate
(gauge)
The average rate of failed consumer topic read json operations
Shown as request
confluent.kafka.schema.registry.jersey.consumer.unsubscribe_v2.request_error_rate
(gauge)
The average rate of failed consumer unsubscribe v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.consume_avro.request_error_rate
(gauge)
The average rate of failed partition consume avro operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.consume_binary.request_error_rate
(gauge)
The average rate of failed partition consume binary operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.consume_json.request_error_rate
(gauge)
The average rate of failed partition consume json operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.get.request_error_rate
(gauge)
The average rate of failed partition get operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.get_v2.request_error_rate
(gauge)
The average rate of failed partition get v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.produce_avro.request_error_rate
(gauge)
The average rate of failed partition produce avro operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.produce_avro_v2.request_error_rate
(gauge)
The average rate of failed partition produce avro v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.produce_binary.request_error_rate
(gauge)
The average rate of failed partition produce binary operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.produce_binary_v2.request_error_rate
(gauge)
The average rate of failed partition produce binary v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.produce_json.request_error_rate
(gauge)
The average rate of failed partition produce json operations
Shown as request
confluent.kafka.schema.registry.jersey.partition.produce_json_v2.request_error_rate
(gauge)
The average rate of failed partition produce json v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.partitions.list.request_error_rate
(gauge)
The average rate of failed partitions list operations
Shown as request
confluent.kafka.schema.registry.jersey.partitions.list_v2.request_error_rate
(gauge)
The average rate of failed partitions list v2 operations
Shown as request
confluent.kafka.schema.registry.jersey.request_error_rate
(gauge)
The average rate of failed operations
Shown as request
confluent.kafka.schema.registry.jersey.root.get.request_error_rate
(gauge)
The average rate of failed root get operations
Shown as request
confluent.kafka.schema.registry.jersey.root.post.request_error_rate
(gauge)
The average rate of failed root post operations
Shown as request
confluent.kafka.schema.registry.jersey.topic.get.request_error_rate
(gauge)
The average rate of failed topic get operations
Shown as request
confluent.kafka.schema.registry.jersey.topic.produce_avro.request_error_rate
(gauge)
The average rate of failed topic produce avro operations
Shown as request
confluent.kafka.schema.registry.jersey.topic.produce_binary.request_error_rate
(gauge)
The average rate of failed topic produce binary operations
Shown as request
confluent.kafka.schema.registry.jersey.topic.produce_json.request_error_rate
(gauge)
The average rate of failed topic produce json operations
Shown as request
confluent.kafka.schema.registry.jersey.topics.list.request_error_rate
(gauge)
The average rate of failed topics list operations
Shown as request
confluent.kafka.schema.registry.jetty.connections_active
(gauge)
Total number of active TCP connections (Schema registry).
Shown as connection
confluent.kafka.schema.registry.jetty.connections_closed_rate
(gauge)
The average rate per second of closed TCP connections (Schema registry).
Shown as connection
confluent.kafka.schema.registry.jetty.connections_opened_rate
(gauge)
The average rate per second of opened TCP connections (Schema registry).
Shown as connection
confluent.kafka.schema.registry.json_schemas_created
(gauge)
schemas created json
confluent.kafka.schema.registry.json_schemas_deleted
(gauge)
schemas deleted json
confluent.kafka.schema.registry.master_slave_role.master_slave_role
(gauge)
The current role of this Schema Registry instance. A value of 1 indicates this instance is the primary, 0 indicates it is a secondary.
confluent.kafka.schema.registry.protobuf_schemas_created
(gauge)
schemas protobuf avro
confluent.kafka.schema.registry.protobuf_schemas_deleted
(gauge)
schemas deleted protobuf
confluent.kafka.schema.registry.registered_count
(gauge)
schemas registered
confluent.kafka.server.broker_topic_metrics.bytes_in_per_sec
(gauge)
Total bytes in per topic
Shown as byte
confluent.kafka.server.broker_topic_metrics.bytes_out_per_sec
(gauge)
Total bytes out per topic
Shown as byte
confluent.kafka.server.broker_topic_metrics.fetch_message_conversions_per_sec
(gauge)
Total messages converted at production per topic
Shown as message
confluent.kafka.server.broker_topic_metrics.messages_in_per_sec
(gauge)
Total messages in per topic
Shown as message
confluent.kafka.server.broker_topic_metrics.messages_out_per_sec
(gauge)
Total messages out per topic
Shown as message
confluent.kafka.server.broker_topic_metrics.produce_message_conversions_per_sec
(gauge)
Total messages converted at production per topic
Shown as message
confluent.kafka.server.delayed_operation_purgatory.purgatory_size
(gauge)
Number of requests waiting in the fetch purgatory. This is high if consumers use a large value for fetch.wait.max.ms.
Shown as request
confluent.kafka.server.fetcher_lag.consumer_lag
(gauge)
Lag in number of messages per follower replica. This is useful to know if the replica is slow or has stopped replicating from the leader.
Shown as message
confluent.kafka.server.produce.delay_queue_size
(gauge)
Number of producer clients currently being throttled. The value can be any number greater than or equal to 0.
confluent.kafka.server.replica_fetcher_manager.max_lag
(gauge)
Maximum lag in messages between the follower and leader replicas. This is controlled by the replica.lag.max.messages config.
Shown as message
confluent.kafka.server.replica_manager.isr_expands_per_sec.rate
(gauge)
Rate at which the pool of in-sync replicas (ISRs) expands.
Shown as unit
confluent.kafka.server.replica_manager.isr_shrinks_per_sec.rate
(gauge)
Rate at which the pool of in-sync replicas (ISRs) shrinks.
Shown as unit
confluent.kafka.server.replica_manager.leader_count
(gauge)
Number of leaders on this broker. This should be mostly even across all brokers. If not, set auto.leader.rebalance.enable to true on all brokers in the cluster.
confluent.kafka.server.replica_manager.partition_count
(gauge)
Number of partitions on this broker. This should be mostly even across all brokers.
confluent.kafka.server.replica_manager.under_min_isr_partition_count
(gauge)
Number of partitions whose in-sync replicas count is less than minIsr.
confluent.kafka.server.replica_manager.under_replicated_partitions
(gauge)
Number of under-replicated partitions (ISR < all replicas).
confluent.kafka.server.request_handler_pool.avg_idle_percent
(gauge)
Average fraction of time the request handler threads are idle. Values are between 0 (all resources are used) and 1 (all resources are available)
Shown as fraction
confluent.kafka.server.request_handler_pool.avg_idle_percent.rate
(gauge)
Number of nanoseconds where the request handler threads were idle during the last second. Values are between 0 (all resources are used) and 10^9 (all resources are available)
Shown as fraction
confluent.kafka.server.session.zoo_keeper_auth_failures_per_sec.rate
(gauge)
An attempt to connect to the ensemble failed because the client has not provided correct credentials.
confluent.kafka.server.session.zoo_keeper_disconnects_per_sec.rate
(gauge)
ZooKeeper client is currently disconnected from the ensemble. The client lost its previous connection to a server and it is currently trying to reconnect. The session is not necessarily expired.
Shown as unit
confluent.kafka.server.session.zoo_keeper_expires_per_sec.rate
(gauge)
The ZooKeeper session has expired. When a session expires, we can have leader changes and even a new controller.
Shown as unit
confluent.kafka.server.session.zoo_keeper_read_only_connects_per_sec.rate
(gauge)
The server the client is connected to is currently LOOKING, which means that it is neither FOLLOWING nor LEADING.
Shown as unit
confluent.kafka.server.session.zoo_keeper_request_latency_ms
(gauge)
Client request latency
Shown as unit
confluent.kafka.server.session.zoo_keeper_sasl_authentications_per_sec.rate
(gauge)
Client has successfully authenticated.
Shown as unit
confluent.kafka.server.session.zoo_keeper_sync_connects_per_sec.rate
(gauge)
ZooKeeper client is connected to the ensemble and ready to execute operations.
Shown as unit
confluent.kafka.server.topic.bytes_in_per_sec.rate
(gauge)
Aggregate incoming byte rate.
Shown as byte
confluent.kafka.server.topic.bytes_out_per_sec.rate
(gauge)
Aggregate outgoing byte rate.
Shown as byte
confluent.kafka.server.topic.bytes_rejected_per_sec.rate
(gauge)
Aggregate rejected byte rate.
Shown as byte
confluent.kafka.server.topic.failed_fetch_requests_per_sec.rate
(gauge)
Fetch request rate for requests that failed.
Shown as request
confluent.kafka.server.topic.failed_produce_requests_per_sec.rate
(gauge)
Produce request rate for requests that failed.
Shown as request
confluent.kafka.server.topic.messages_in_per_sec.rate
(gauge)
Aggregate incoming message rate.
Shown as message
confluent.kafka.server.topic.total_fetch_requests_per_sec.rate
(gauge)
Fetch request rate.
Shown as request
confluent.kafka.server.topic.total_produce_requests_per_sec.rate
(gauge)
Produce request rate.
Shown as request
confluent.kafka.streams.processor_node.forward_rate
(gauge)
The average rate of records being forwarded downstream, from source nodes only, per second. This metric can be used to understand how fast the library is consuming from source topics.
Shown as record
confluent.kafka.streams.processor_node.forward_total
(gauge)
The total number of records being forwarded downstream, from source nodes only.
Shown as record
confluent.kafka.streams.processor_node.process_latency_avg
(gauge)
The average execution time in ns, for the respective operation.
Shown as nanosecond
confluent.kafka.streams.processor_node.process_rate
(gauge)
The average number of respective operations per second.
Shown as operation
confluent.kafka.streams.processor_node.process_total
(gauge)
The total number of respective operations.
Shown as operation
confluent.kafka.streams.processor_node.suppression_emit_rate
(gauge)
The rate at which records that have been emitted downstream from suppression operation nodes. Compare with the process-rate metric to determine how many updates are being suppressed.
Shown as record
confluent.kafka.streams.processor_node.suppression_emit_total
(gauge)
The total number of records that have been emitted downstream from suppression operation nodes. Compare with the process-total metric to determine how many updates are being suppressed.
Shown as record
confluent.kafka.streams.stream.commit_latency_avg
(gauge)
The average value of commit-latency.
Shown as nanosecond
confluent.kafka.streams.stream.commit_latency_max
(gauge)
The maximum value of commit-latency.
Shown as nanosecond
confluent.kafka.streams.stream.commit_rate
(gauge)
The average per-second number of commit calls
Shown as commit
confluent.kafka.streams.stream.commit_total
(gauge)
The total number of commit calls
Shown as commit
confluent.kafka.streams.stream.poll_latency_avg
(gauge)
The average value of poll-latency.
Shown as nanosecond
confluent.kafka.streams.stream.poll_latency_max
(gauge)
The maximum value of poll-latency.
Shown as nanosecond
confluent.kafka.streams.stream.poll_rate
(gauge)
The average per-second number of poll calls
Shown as unit
confluent.kafka.streams.stream.poll_total
(gauge)
The total number of poll calls
Shown as unit
confluent.kafka.streams.stream.process_latency_avg
(gauge)
The average value of process-latency.
Shown as nanosecond
confluent.kafka.streams.stream.process_latency_max
(gauge)
The maximum value of process-latency.
Shown as nanosecond
confluent.kafka.streams.stream.process_rate
(gauge)
The average per-second number of process calls
Shown as unit
confluent.kafka.streams.stream.process_total
(gauge)
The total number of process calls
Shown as unit
confluent.kafka.streams.stream.punctuate_latency_avg
(gauge)
The average value of punctuate-latency.
Shown as nanosecond
confluent.kafka.streams.stream.punctuate_latency_max
(gauge)
The maximum value of punctuate-latency.
Shown as nanosecond
confluent.kafka.streams.stream.punctuate_rate
(gauge)
The average per-second number of punctuate calls
Shown as unit
confluent.kafka.streams.stream.punctuate_total
(gauge)
The total number of punctuate calls
Shown as unit
confluent.kafka.streams.stream.skipped_records_rate
(gauge)
The average per-second number of skipped records
Shown as record
confluent.kafka.streams.stream.skipped_records_total
(gauge)
The total number of skipped records
Shown as record
confluent.kafka.streams.stream.task_closed_rate
(gauge)
The average per-second number of closed tasks
Shown as task
confluent.kafka.streams.stream.task_closed_total
(gauge)
The total number of closed tasks
Shown as task
confluent.kafka.streams.stream.task_created_rate
(gauge)
The average per-second number of newly created tasks
Shown as task
confluent.kafka.streams.stream.task_created_total
(gauge)
The total number of newly created tasks
Shown as task
confluent.kafka.streams.task.commit_latency_avg
(gauge)
The average value of task commit-latency.
Shown as nanosecond
confluent.kafka.streams.task.commit_rate
(gauge)
The average per-second number of commit calls over all tasks
Shown as unit
confluent.kafka.streams.task.record_lateness_avg
(gauge)
The average value of record-lateness.
Shown as millisecond
confluent.ksql.consumer_metrics.consumer_messages_per_sec
(gauge)
consumermessagesper_sec
confluent.ksql.consumer_metrics.consumer_total_bytes
(gauge)
consumertotalbytes
confluent.ksql.consumer_metrics.consumer_total_messages
(gauge)
consumertotalmessages
confluent.ksql.ksql_rocksdb_aggregates.block_cache_pinned_usage_max
(gauge)
blockcachepinnedusagemax
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.block_cache_pinned_usage_total
(gauge)
blockcachepinnedusagetotal
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.block_cache_usage_max
(gauge)
block cache usage
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.block_cache_usage_total
(gauge)
blockcacheusage_total
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.compaction_pending_total
(gauge)
compactionpendingtotal
confluent.ksql.ksql_rocksdb_aggregates.cur_size_active_mem_table_total
(gauge)
cursizeactivememtable_total
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.cur_size_all_mem_tables_total
(gauge)
size all mem tables
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.estimate_num_keys_total
(gauge)
estimatenumkeys_total
confluent.ksql.ksql_rocksdb_aggregates.estimate_pending_compaction_bytes_total
(gauge)
estimatependingcompactionbytestotal
confluent.ksql.ksql_rocksdb_aggregates.estimate_table_readers_mem_total
(gauge)
estimatetablereadersmemtotal
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.live_sst_files_size_total
(gauge)
livesstfilessizetotal
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.mem_table_flush_pending_total
(gauge)
memtableflushpendingtotal
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.num_deletes_active_mem_table_total
(gauge)
numdeletesactivememtable_total
confluent.ksql.ksql_rocksdb_aggregates.num_deletes_imm_mem_tables_total
(gauge)
delete all mem tables
confluent.ksql.ksql_rocksdb_aggregates.num_entries_active_mem_table_total
(gauge)
entries all mem tables
confluent.ksql.ksql_rocksdb_aggregates.num_entries_imm_mem_tables_total
(gauge)
numentriesimmmemtables_total
Shown as byte
confluent.ksql.ksql_rocksdb_aggregates.num_immutable_mem_table_total
(gauge)
immutable mem table total
confluent.ksql.ksql_rocksdb_aggregates.num_running_compactions_total
(gauge)
numrunningcompactions
confluent.ksql.ksql_rocksdb_aggregates.num_running_flushes_total
(gauge)
running flushes total
confluent.ksql.ksql_rocksdb_aggregates.total_sst_files_size_total
(gauge)
totalsstfilessizetotal
Shown as byte
confluent.ksql.producer_metrics.messages_per_sec
(gauge)
messagespersec
confluent.ksql.producer_metrics.total_messages
(gauge)
total messages
confluent.ksql.pull_query_metrics.pull_query_requests_error_rate
(gauge)
pullqueryrequestserrorrate
confluent.ksql.pull_query_metrics.pull_query_requests_error_total
(gauge)
pullqueryrequestserrortotal
confluent.ksql.pull_query_metrics.pull_query_requests_latency_distribution_50
(gauge)
pullqueryrequestslatencydistribution_50
confluent.ksql.pull_query_metrics.pull_query_requests_latency_distribution_75
(gauge)
pullqueryrequestslatencydistribution_75
confluent.ksql.pull_query_metrics.pull_query_requests_latency_distribution_90
(gauge)
pullqueryrequestslatencydistribution_90
confluent.ksql.pull_query_metrics.pull_query_requests_latency_distribution_99
(gauge)
pullqueryrequestslatencydistribution_99
confluent.ksql.pull_query_metrics.pull_query_requests_latency_latency_avg
(gauge)
pullqueryrequestslatencylatency_avg
confluent.ksql.pull_query_metrics.pull_query_requests_latency_latency_max
(gauge)
pullqueryrequestslatencylatency_max
confluent.ksql.pull_query_metrics.pull_query_requests_latency_latency_min
(gauge)
pullqueryrequestslatencylatency_min
confluent.ksql.pull_query_metrics.pull_query_requests_local
(gauge)
pullqueryrequests_local
confluent.ksql.pull_query_metrics.pull_query_requests_local_rate
(gauge)
pullqueryrequestslocalrate
confluent.ksql.pull_query_metrics.pull_query_requests_rate
(gauge)
pullqueryrequests_rate
confluent.ksql.pull_query_metrics.pull_query_requests_remote
(gauge)
pullqueryrequests_remote
confluent.ksql.pull_query_metrics.pull_query_requests_remote_rate
(gauge)
pullqueryrequestsremoterate
confluent.ksql.pull_query_metrics.pull_query_requests_total
(gauge)
pullqueryrequests_total
confluent.ksql.query_stats.bytes_consumed_total
(gauge)
Number of bytes consumed across all queries.
Shown as byte
confluent.ksql.query_stats.created_queries
(gauge)
CREATED_queries
Shown as message
confluent.ksql.query_stats.error_queries
(gauge)

Shown as message
confluent.ksql.query_stats.error_rate
(gauge)
Number of messages that have been consumed but not processed across all queries.
Shown as message
confluent.ksql.query_stats.messages_consumed_avg
(gauge)
Average number of messages consumed by a query per second.
Shown as message
confluent.ksql.query_stats.messages_consumed_max
(gauge)
Number of messages consumed per second for the query with the most messages consumed per second.
Shown as message
confluent.ksql.query_stats.messages_consumed_min
(gauge)
Number of messages consumed per second for the query with the fewest messages consumed per second.
Shown as message
confluent.ksql.query_stats.messages_consumed_per_sec
(gauge)
Number of messages consumed per second across all queries.
Shown as message
confluent.ksql.query_stats.messages_consumed_total
(gauge)
Number of messages consumed across all queries.
Shown as message
confluent.ksql.query_stats.messages_produced_per_sec
(gauge)
Number of messages produced per second across all queries.
Shown as message
confluent.ksql.query_stats.not_running_queries
(gauge)
NOTRUNNINGqueries
Shown as message
confluent.ksql.query_stats.num_active_queries
(gauge)
Number of queries that are actively processing messages.
Shown as message
confluent.ksql.query_stats.num_idle_queries
(gauge)
Number of queries with no messages available to process.
Shown as message
confluent.ksql.query_stats.num_persistent_queries
(gauge)
Number of persistent queries that are currently executing.
Shown as query
confluent.ksql.query_stats.pending_shutdown_queries
(gauge)
PENDINGSHUTDOWNqueries
Shown as message
confluent.ksql.query_stats.rebalancing_queries
(gauge)
REBALANCING_queries
Shown as message
confluent.ksql.query_stats.running_queries
(gauge)
RUNNING_queries
Shown as message
confluent.replicator.task.topic_partition_latency
(gauge)
The average time between message production to the source cluster and message production to the destination cluster.
confluent.replicator.task.topic_partition_message_lag
(gauge)
The number of messages that were produced to the origin cluster, but have not yet arrived to the destination cluster.
Shown as message
confluent.replicator.task.topic_partition_throughput
(gauge)
The number of messages replicated per second from the source to destination cluster.
Shown as message

Events

The Confluent Platform check does not include any events.

Service Checks

confluent.can_connect
Returns CRITICAL if the Agent is unable to connect to and collect metrics from the monitored Confluent Platform component instance, WARNING if no metrics are collected, and OK otherwise.
Statuses: ok, critical, warning

Troubleshooting

Need help? Contact Datadog support.

PREVIEWING: may/unit-testing