flume

Supported OS Linux Mac OS Windows

Integration version0.0.1

Overview

This check monitors Apache Flume.

Setup

The Flume check is not included in the Datadog Agent package, so you need to install it.

Installation

For Agent v7.21+ / v6.21+, follow the instructions below to install the Flume check on your host. See Use Community Integrations to install with the Docker Agent or earlier versions of the Agent.

  1. Run the following command to install the Agent integration:

    datadog-agent integration install -t datadog-flume==<INTEGRATION_VERSION>
    
  2. Configure your integration similar to core integrations.

Configuration

  1. Configure the Flume agent to enable JMX by adding the following JVM arguments to your flume-env.sh:
export JAVA_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5445 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"
  1. Edit the flume.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting Flume performance data. See the sample flume.d/conf.yaml file for all available configuration options.

    This check has a limit of 350 metrics per instance. The number of returned metrics is indicated in the status output. You can specify the metrics you are interested in by editing the configuration below. For detailed instructions on customizing the metrics to collect, see the JMX Checks documentation. If you need to monitor more metrics, contact Datadog support.

  2. Restart the Agent

Validation

Run the Agent’s status subcommand and look for flume under the Checks section.

Component metrics

The metrics retrieved by this check depend on the source, channel, and sink used by your Flume agent. For a full list of metrics exposed by each component, review Available Component Metrics from the Apache Flume documentation. For a list of the metrics that you can see in Datadog, see the Metrics section on this page.

Data Collected

Metrics

flume.channel.capacity
(gauge)
The maximum number of events that can be queued in the channel at any time. For channel types without a capacity limit the value will be zero.
Shown as event
flume.channel.fill_percentage
(gauge)
The channel fill percentage.
Shown as percent
flume.channel.size
(gauge)
The number of events currently queued in the channel.
Shown as event
flume.channel.event_put_attempt_count
(count)
The total number of events that have been attempted to be put into the channel.
Shown as event
flume.channel.event_put_success_count
(count)
The total number of events that have successfully been put into the channel.
Shown as event
flume.channel.event_take_attempt_count
(count)
The total number of attempts that have been made to take an event from the channel.
Shown as event
flume.channel.event_take_success_count
(count)
The total number of events that have successfully been taken from the channel.
Shown as event
flume.channel.kafka_commit_timer
(gauge)
The timer for the Kafka channel commits.
Shown as time
flume.channel.kafka_event_get_timer
(gauge)
The timer for the kafka channel retrieving events.
Shown as time
flume.channel.kafka_event_send_timer
(gauge)
The timer for the Kafka channel sending events.
Shown as time
flume.channel.rollbackcount
(count)
The count of rollbacks from the kafka channel.
Shown as event
flume.sink.event_write_fail
(count)
The total number of failed write events.
Shown as event
flume.sink.batch_empty_count
(count)
The number of append batches attempted containing zero events.
Shown as event
flume.sink.channel_read_fail
(count)
The number of failed read events from the channel.
Shown as event
flume.sink.batch_complete_count
(count)
The number of append batches attempted containing the maximum number of events supported by the next hop.
Shown as event
flume.sink.batch_underflow_count
(count)
The number of append batches attempted containing less than the maximum number of events supported by the next hop.
Shown as event
flume.sink.connection_closed_count
(count)
The number of connections closed by this sink.
Shown as connection
flume.sink.connection_failed_count
(count)
The number of failed connections.
Shown as connection
flume.sink.connection_created_count
(count)
The number of connections created by this sink. Only applicable to some sink types.
Shown as connection
flume.sink.event_drain_attempt_count
(count)
The total number of events that have been attempted to be drained to the next hop.
Shown as event
flume.sink.event_drain_success_count
(count)
The total number of events that have successfully been drained to the next hop
Shown as event
flume.sink.kafka_event_sent_timer
(gauge)
The timer for the Kafka sink sending events.
Shown as time
flume.sink.rollbackcount
(gauge)
The count of rollbacks from the Kafka sink.
Shown as event
flume.source.event_read_fail
(count)
The total number of failed read source events.
Shown as event
flume.source.channel_write_fail
(count)
The total number of failed channel write events.
Shown as event
flume.source.event_accepted_count
(count)
The total number of events successfully accepted, either through append batches or single-event appends.
Shown as event
flume.source.event_received_count
(count)
The total number of events received, either through append batches or single-event appends.
Shown as event
flume.source.append_accepted_count
(count)
The total number of single-event appends successfully accepted.
Shown as event
flume.source.append_received_count
(count)
The total number of single-event appends received.
Shown as event
flume.source.open_connection_count
(count)
The number of open connections
Shown as connection
flume.source.generic_processing_fail
(count)
The total number of generic processing failures.
Shown as event
flume.source.append_batch_accepted_count
(count)
The total number of append batches accepted successfully.
Shown as event
flume.source.append_batch_received_count
(count)
The total number of append batches received.
Shown as event
flume.source.kafka_commit_timer
(gauge)
The timer for the Kafka source committing events.
Shown as time
flume.source.kafka_empty_count
(count)
The count of empty events from the Kafka source.
Shown as event
flume.source.kafka_event_get_timer
(gauge)
The timer for the Kafka source retrieving events.
Shown as time

Events

Flume does not include any events.

Service Checks

flume.can_connect
Returns CRITICAL if the Agent is unable to connect to and collect metrics from the monitored Flume instance. Returns OK otherwise.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.

PREVIEWING: rtrieu/product-analytics-ui-changes