Hbase Master

Supported OS Linux Mac OS Windows

Integration version1.1.1

Overview

Get metrics from Hbase_master service in real time to:

  • Visualize and monitor Hbase_master states.
  • Be notified about Hbase_master failovers and events.

Setup

The Hbase_master check is not included in the Datadog Agent package, so you need to install it.

Installation

For Agent v7.21+ / v6.21+, follow the instructions below to install the Hbase_master check on your host. See Use Community Integrations to install with the Docker Agent or earlier versions of the Agent.

  1. Run the following command to install the Agent integration:

    datadog-agent integration install -t datadog-hbase_master==<INTEGRATION_VERSION>
    
  2. Configure your integration similar to core integrations.

Configuration

  1. Edit the hbase_master.d/conf.yaml file in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your Hbase_master metrics. See the sample hbase_master.d/conf.yaml for all available configuration options.

    NOTE: If using Agent 6, be sure to modify the hbase_master.d/metrics.yaml file and wrap boolean keys in quotes.

      - include:
          domain: Hadoop
          bean:
            - Hadoop:service=HBase,name=Master,sub=Server
          attribute:
            # Is Active Master
            tag.isActiveMaster:
               metric_type: gauge
               alias: hbase.master.server.tag.is_active_master
               values: {"true": 1, "false": 0, default: 0}
    
  2. Restart the Agent

Log collection

  1. Collecting logs is disabled by default in the Datadog Agent, you need to enable it in datadog.yaml:

    logs_enabled: true
    
  2. Add this configuration block to your hbase_master.d/conf.yaml file to start collecting your Hbase_master Logs:

    logs:
      - type: file
        path: /path/to/my/directory/file.log
        source: hbase
    

    Change the path parameter value and configure it for your environment. See the sample hbase_master.d/conf.yaml for all available configuration options.

  3. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for hbase_master under the Checks section.

Data Collected

Metrics

hbase.master.assignmentmanager.rit_oldest_age
(gauge)
The age of the longest region in transition, in milliseconds
Shown as millisecond
hbase.master.assignmentmanager.rit_count_over_threshold
(gauge)
The number of regions that have been in transition longer than a threshold time
hbase.master.assignmentmanager.rit_count
(gauge)
The number of regions in transition
hbase.master.assignmentmanager.assign.min
(gauge)
hbase.master.assignmentmanager.assign.max
(gauge)
hbase.master.assignmentmanager.assign.mean
(gauge)
hbase.master.assignmentmanager.assign.median
(gauge)
hbase.master.assignmentmanager.assign.percentile.99
(gauge)
hbase.master.ipc.queue_size
(gauge)
Number of bytes in the call queues.
Shown as byte
hbase.master.ipc.num_calls_in_general_queue
(gauge)
Number of calls in the general call queue.
hbase.master.ipc.num_calls_in_replication_queue
(gauge)
Number of calls in the replication call queue.
hbase.master.ipc.num_calls_in_priority_queue
(gauge)
Number of calls in the priority call queue.
hbase.master.ipc.num_open_connections
(gauge)
Number of open connections.
hbase.master.ipc.num_active_handler
(gauge)
Number of active rpc handlers.
hbase.master.ipc.total_call_time.max
(gauge)
total call time, including both queued and processing time.
Shown as millisecond
hbase.master.ipc.total_call_time.mean
(gauge)
total call time, including both queued and processing time.
Shown as millisecond
hbase.master.ipc.total_call_time.median
(gauge)
total call time, including both queued and processing time.
Shown as millisecond
hbase.master.ipc.total_call_time.percentile.99
(gauge)
total call time, including both queued and processing time.
Shown as millisecond
hbase.master.server.tag.is_active_master
(gauge)
Is Active Master
hbase.master.server.num_region_servers
(gauge)
Number of RegionServers
hbase.master.server.num_dead_region_servers
(gauge)
Number of dead RegionServers

Events

The Hbase_master check does not include any events.

Service Checks

The Hbase_master check does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.

HBase RegionServer Integration

Overview

Get metrics from the HBase RegionServer service in real time to:

  • Visualize and monitor HBase RegionServer states.
  • Be notified about HBase RegionServer failovers and events.

Setup

The HBase RegionServer check is not included in the Datadog Agent package, so you need to install it.

Installation

For Agent v7.21+ / v6.21+, follow the instructions below to install the HBase RegionServer check on your host. See Use Community Integrations to install with the Docker Agent or earlier versions of the Agent.

  1. Run the following command to install the Agent integration:

    datadog-agent integration install -t datadog-hbase_regionserver==<INTEGRATION_VERSION>
    
  2. Configure your integration similar to core integrations.

Configuration

  1. Edit the hbase_regionserver.d/conf.yaml file in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your HBase RegionServer metrics. See the sample hbase_regionserver.d/conf.yaml for all available configuration options.

  2. Restart the Agent

Log collection

  1. Collecting logs is disabled by default in the Datadog Agent, you need to enable it in datadog.yaml:

    logs_enabled: true
    
  2. Add this configuration block to your hbase_regionserver.d/conf.yaml file to start collecting your Hbase_regionserver Logs:

    logs:
      - type: file
        path: /path/to/my/directory/file.log
        source: hbase
    

    Change the path parameter value and configure it for your environment. See the sample hbase_regionserver.d/conf.yaml for all available configuration options.

  3. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for hbase_regionserver under the Checks section.

Data Collected

Metrics

hbase.regionserver.ipc.queue_size
(gauge)
Number of bytes in the call queues.
Shown as byte
hbase.regionserver.ipc.num_open_connections
(gauge)
Number of open connections.
hbase.regionserver.ipc.num_active_handler
(gauge)
Number of active rpc handlers.
hbase.regionserver.ipc.total_call_time.max
(gauge)
total call time, including both queued and processing time.
Shown as millisecond
hbase.regionserver.ipc.total_call_time.mean
(gauge)
total call time, including both queued and processing time.
Shown as millisecond
hbase.regionserver.ipc.total_call_time.median
(gauge)
total call time, including both queued and processing time.
Shown as millisecond
hbase.regionserver.ipc.total_call_time.percentile.99
(gauge)
total call time, including both queued and processing time.
Shown as millisecond
hbase.regionserver.regions.num_regions
(gauge)
Number of regions in the metrics system
hbase.regionserver.replication.sink.applied_ops
(gauge)
Number of WAL entries applied on replication sink.
hbase.regionserver.replication.sink.age_of_last_applied_op
(gauge)
Replication time lag of last applied WAL entry between source and sink.
Shown as millisecond
hbase.regionserver.replication.sink.applied_batches
(gauge)
Number of WAL applying operations processd on replication sink.
hbase.regionserver.server.region_count
(gauge)
Number of regions
hbase.regionserver.server.store_count
(gauge)
Number of Stores
hbase.regionserver.server.hlog_file_count
(gauge)
Number of WAL Files
hbase.regionserver.server.hlog_file_size
(gauge)
Size of all WAL Files
Shown as byte
hbase.regionserver.server.store_file_count
(gauge)
Number of Store Files
hbase.regionserver.server.mem_store_size
(gauge)
Size of the memstore
Shown as byte
hbase.regionserver.server.store_file_size
(gauge)
Size of storefiles being served.
Shown as byte
hbase.regionserver.server.total_request_count
(gauge)
Total number of requests this RegionServer has answered.
hbase.regionserver.server.read_request_count
(gauge)
Number of read requests this region server has answered.
hbase.regionserver.server.write_request_count
(gauge)
Number of mutation requests this region server has answered.
hbase.regionserver.server.check_mutate_failed_count
(gauge)
Number of Check and Mutate calls that failed the checks.
hbase.regionserver.server.check_mutate_passed_count
(gauge)
Number of Check and Mutate calls that passed the checks.
hbase.regionserver.server.store_file_index_size
(gauge)
Size of indexes in storefiles on disk.
Shown as byte
hbase.regionserver.server.static_index_size
(gauge)
Uncompressed size of the static indexes.
Shown as byte
hbase.regionserver.server.static_bloom_size
(gauge)
Uncompressed size of the static bloom filters.
Shown as byte
hbase.regionserver.server.mutations_without_wal_count
(count)
Number of mutations that have been sent by clients with the write ahead logging turned off.
hbase.regionserver.server.mutations_without_wal_size
(gauge)
Size of data that has been sent by clients with the write ahead logging turned off.
Shown as byte
hbase.regionserver.server.percent_files_local
(gauge)
The percent of HFiles that are stored on the local hdfs data node.
Shown as percent
hbase.regionserver.server.percent_files_local_secondary_regions
(gauge)
The percent of HFiles used by secondary regions that are stored on the local hdfs data node.
Shown as percent
hbase.regionserver.server.split_queue_length
(gauge)
Length of the queue for splits.
hbase.regionserver.server.compaction_queue_length
(gauge)
Length of the queue for compactions.
hbase.regionserver.server.flush_queue_length
(gauge)
Length of the queue for region flushes
hbase.regionserver.server.block_cache_free_size
(gauge)
Size of the block cache that is not occupied.
Shown as byte
hbase.regionserver.server.block_cache_count
(gauge)
Number of block in the block cache.
hbase.regionserver.server.block_cache_size
(gauge)
Size of the block cache.
Shown as byte
hbase.regionserver.server.block_cache_hit_count
(gauge)
Count of the hit on the block cache.
hbase.regionserver.server.block_cache_hit_count_primary
(gauge)
Count of hit on primary replica in the block cache.
hbase.regionserver.server.block_cache_miss_count
(gauge)
Number of requests for a block that missed the block cache.
hbase.regionserver.server.block_cache_miss_count_primary
(gauge)
Number of requests for a block of primary replica that missed the block cache.
hbase.regionserver.server.block_cache_eviction_count
(gauge)
Count of the number of blocks evicted from the block cache.
hbase.regionserver.server.block_cache_eviction_count_primary
(gauge)
Count of the number of blocks evicted from primary replica in the block cache.
hbase.regionserver.server.block_cache_hit_percent
(gauge)
Percent of block cache requests that are hits
Shown as percent
hbase.regionserver.server.block_cache_express_hit_percent
(gauge)
The percent of the time that requests with the cache turned on hit the cache.
Shown as percent
hbase.regionserver.server.block_cache_failed_insertion_count
(gauge)
Number of times that a block cache insertion failed. Usually due to size restrictions.
Shown as millisecond
hbase.regionserver.server.updates_blocked_time
(gauge)
Number of MS updates have been blocked so that the memstore can be flushed.
Shown as millisecond
hbase.regionserver.server.flushed_cells_count
(gauge)
The number of cells flushed to disk
hbase.regionserver.server.compacted_cells_count
(gauge)
The number of cells processed during minor compactions
hbase.regionserver.server.major_compacted_cells_count
(gauge)
The number of cells processed during major compactions
hbase.regionserver.server.flushed_cells_size
(gauge)
The total amount of data flushed to disk, in bytes
Shown as byte
hbase.regionserver.server.compacted_cells_size
(gauge)
The total amount of data processed during minor compactions, in bytes
Shown as byte
hbase.regionserver.server.major_compacted_cells_size
(gauge)
The total amount of data processed during major compactions, in bytes
Shown as byte
hbase.regionserver.server.blocked_request_count
(gauge)
The number of blocked requests because of memstore size is larger than blockingMemStoreSize
hbase.regionserver.server.hedged_read
(gauge)
hbase.regionserver.server.hedged_read_wins
(gauge)
hbase.regionserver.server.pause_time_with_gc_num_ops
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_with_gc.min
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_with_gc.max
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_with_gc.mean
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_with_gc.median
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_with_gc.percentile.99
(gauge)

Shown as millisecond
hbase.regionserver.server.mutate.num_ops
(gauge)
hbase.regionserver.server.mutate.min
(gauge)
hbase.regionserver.server.mutate.max
(gauge)
hbase.regionserver.server.mutate.mean
(gauge)
hbase.regionserver.server.mutate.median
(gauge)
hbase.regionserver.server.mutate.percentile.99
(gauge)
hbase.regionserver.server.slow_append_count
(gauge)
The number of Appends that took over 1000ms to complete
hbase.regionserver.server.pause_warn_threshold_exceeded
(gauge)
hbase.regionserver.server.slow_delete_count
(gauge)
The number of Deletes that took over 1000ms to complete
hbase.regionserver.server.increment.num_ops
(gauge)
hbase.regionserver.server.increment.min
(gauge)
hbase.regionserver.server.increment.max
(gauge)
hbase.regionserver.server.increment.mean
(gauge)
hbase.regionserver.server.increment.median
(gauge)
hbase.regionserver.server.increment.percentile.99
(gauge)
hbase.regionserver.server.replay.num_ops
(gauge)
hbase.regionserver.server.replay.min
(gauge)
hbase.regionserver.server.replay.max
(gauge)
hbase.regionserver.server.replay.mean
(gauge)
hbase.regionserver.server.replay.median
(gauge)
hbase.regionserver.server.replay.percentile.99
(gauge)
hbase.regionserver.server.flush_time.num_ops
(gauge)

Shown as millisecond
hbase.regionserver.server.flush_time.min
(gauge)

Shown as millisecond
hbase.regionserver.server.flush_time.max
(gauge)

Shown as millisecond
hbase.regionserver.server.flush_time.mean
(gauge)

Shown as millisecond
hbase.regionserver.server.flush_time.median
(gauge)

Shown as millisecond
hbase.regionserver.server.flush_time.percentile.99
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_info_threshold_exceeded
(gauge)
hbase.regionserver.server.delete.num_ops
(gauge)
hbase.regionserver.server.delete.min
(gauge)
hbase.regionserver.server.delete.max
(gauge)
hbase.regionserver.server.delete.mean
(gauge)
hbase.regionserver.server.delete.median
(gauge)
hbase.regionserver.server.delete.percentile.99
(gauge)
hbase.regionserver.server.split_request_count
(gauge)
Number of splits requested
hbase.regionserver.server.split_success_count
(gauge)
Number of successfully executed splits
hbase.regionserver.server.slow_get_count
(gauge)
The number of Gets that took over 1000ms to complete
hbase.regionserver.server.get.num_ops
(gauge)
hbase.regionserver.server.get.min
(gauge)
hbase.regionserver.server.get.max
(gauge)
hbase.regionserver.server.get.mean
(gauge)
hbase.regionserver.server.get.median
(gauge)
hbase.regionserver.server.get.percentile.99
(gauge)
hbase.regionserver.server.scan_next.num_ops
(gauge)
hbase.regionserver.server.scan_next.min
(gauge)
hbase.regionserver.server.scan_next.max
(gauge)
hbase.regionserver.server.scan_next.mean
(gauge)
hbase.regionserver.server.scan_next.median
(gauge)
hbase.regionserver.server.scan_next.percentile.99
(gauge)
hbase.regionserver.server.pause_time_without_gc.num_ops
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_without_gc.min
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_without_gc.max
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_without_gc.mean
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_without_gc.median
(gauge)

Shown as millisecond
hbase.regionserver.server.pause_time_without_gc.percentile.99
(gauge)

Shown as millisecond
hbase.regionserver.server.slow_put_count
(gauge)
The number of Multis that took over 1000ms to complete
hbase.regionserver.server.slow_increment_count
(gauge)
The number of Increments that took over 1000ms to complete
hbase.regionserver.server.split_time.num_ops
(gauge)

Shown as millisecond
hbase.regionserver.server.split_time.min
(gauge)

Shown as millisecond
hbase.regionserver.server.split_time.max
(gauge)

Shown as millisecond
hbase.regionserver.server.split_time.mean
(gauge)

Shown as millisecond
hbase.regionserver.server.split_time.median
(gauge)

Shown as millisecond
hbase.regionserver.server.split_time.percentile.99
(gauge)

Shown as millisecond
hbase.regionserver.wal.append_size.num_ops
(gauge)
size (in bytes) of the data appended to the WAL.
Shown as byte
hbase.regionserver.wal.append_size.min
(gauge)
size (in bytes) of the data appended to the WAL.
Shown as byte
hbase.regionserver.wal.append_size.max
(gauge)
size (in bytes) of the data appended to the WAL.
Shown as byte
hbase.regionserver.wal.append_size.mean
(gauge)
size (in bytes) of the data appended to the WAL.
Shown as byte
hbase.regionserver.wal.append_size.median
(gauge)
size (in bytes) of the data appended to the WAL.
Shown as byte
hbase.regionserver.wal.append_size.percentile.99
(gauge)
size (in bytes) of the data appended to the WAL.
Shown as byte
hbase.regionserver.wal.sync_time.num_ops
(gauge)
the time it took to sync the WAL to HDFS.
Shown as millisecond
hbase.regionserver.wal.sync_time.min
(gauge)
the time it took to sync the WAL to HDFS.
Shown as millisecond
hbase.regionserver.wal.sync_time.max
(gauge)
the time it took to sync the WAL to HDFS.
Shown as millisecond
hbase.regionserver.wal.sync_time.mean
(gauge)
the time it took to sync the WAL to HDFS.
Shown as millisecond
hbase.regionserver.wal.sync_time.median
(gauge)
the time it took to sync the WAL to HDFS.
Shown as millisecond
hbase.regionserver.wal.sync_time.percentile.99
(gauge)
the time it took to sync the WAL to HDFS.
Shown as millisecond
hbase.regionserver.wal.slow_append_count
(gauge)
Number of appends that were slow.
hbase.regionserver.wal.roll_request
(gauge)
How many times a log roll has been requested total
Shown as millisecond
hbase.regionserver.wal.append_count
(gauge)
Number of appends to the write ahead log.
hbase.regionserver.wal.low_replica_roll_request
(gauge)
How many times a log roll was requested due to too few DN's in the write pipeline.
Shown as millisecond
hbase.regionserver.wal.append_time.num_ops
(gauge)
time an append to the log took.
Shown as millisecond
hbase.regionserver.wal.append_time.min
(gauge)
time an append to the log took.
Shown as millisecond
hbase.regionserver.wal.append_time.max
(gauge)
time an append to the log took.
Shown as millisecond
hbase.regionserver.wal.append_time.mean
(gauge)
time an append to the log took.
Shown as millisecond
hbase.regionserver.wal.append_time.median
(gauge)
time an append to the log took.
Shown as millisecond
hbase.regionserver.wal.append_time.percentile.99
(gauge)
time an append to the log took.
Shown as millisecond
hbase.jvm_metrics.mem_non_heap_used_in_mb
(gauge)
Non-heap memory used in MB
hbase.jvm_metrics.mem_non_heap_committed_in_mb
(gauge)
Non-heap memory committed in MB
hbase.jvm_metrics.mem_non_heap_max_in_mb
(gauge)
Non-heap memory max in MB
hbase.jvm_metrics.mem_heap_used_in_mb
(gauge)
Heap memory used in MB
hbase.jvm_metrics.mem_heap_committed_in_mb
(gauge)
Heap memory committed in MB
hbase.jvm_metrics.mem_heap_max_in_mb
(gauge)
Heap memory max in MB
hbase.jvm_metrics.mem_max_in_mb
(gauge)
Max memory size in MB
hbase.jvm_metrics.gc_count_par_new
(gauge)
GC Count for ParNew
hbase.jvm_metrics.gc_time_millis_par_new
(gauge)
GC Time for ParNew
Shown as millisecond
hbase.jvm_metrics.gc_count_concurrent_mark_sweep
(gauge)
GC Count for ConcurrentMarkSweep
hbase.jvm_metrics.gc_time_millis_concurrent_mark_sweep
(gauge)
GC Time for ConcurrentMarkSweep
Shown as millisecond
hbase.jvm_metrics.gc_count
(gauge)
Total GC count
hbase.jvm_metrics.gc_time_millis
(gauge)
Total GC time in milliseconds
Shown as millisecond

Events

The HBase RegionServer check does not include any events.

Service Checks

The HBase RegionServer check does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.

PREVIEWING: rtrieu/product-analytics-ui-changes