Map Reduce

Supported OS Linux Windows Mac OS

Integration version6.0.0

MapReduce Dashboard

Overview

Get metrics from mapreduce service in real time to:

  • Visualize and monitor mapreduce states
  • Be notified about mapreduce failovers and events.

Setup

Installation

The Mapreduce check is included in the Datadog Agent package, so you don’t need to install anything else on your servers.

Configuration

Host

To configure this check for an Agent running on a host:

  1. Edit the mapreduce.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to point to your server and port, set the masters to monitor. See the sample mapreduce.d/conf.yaml for all available configuration options.

  2. Restart the Agent.

Log collection
  1. Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:

    logs_enabled: true
    
  2. Uncomment and edit the logs configuration block in your mapreduce.d/conf.yaml file. Change the type, path, and service parameter values based on your environment. See the sample mapreduce.d/conf.yaml for all available configuration options.

    logs:
      - type: file
        path: <LOG_FILE_PATH>
        source: mapreduce
        service: <SERVICE_NAME>
        # To handle multi line that starts with yyyy-mm-dd use the following pattern
        # log_processing_rules:
        #   - type: multi_line
        #     pattern: \d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2},\d{3}
        #     name: new_log_start_with_date
    
  3. Restart the Agent.

Containerized

For containerized environments, see the Autodiscovery Integration Templates for guidance on applying the parameters below.

ParameterValue
<INTEGRATION_NAME>mapreduce
<INIT_CONFIG>blank or {}
<INSTANCE_CONFIG>{"resourcemanager_uri": "https://%%host%%:8088", "cluster_name":"<MAPREDUCE_CLUSTER_NAME>"}
Log collection

Collecting logs is disabled by default in the Datadog Agent. To enable it, see the Docker Log Collection.

Then, set log integrations as Docker labels:

LABEL "com.datadoghq.ad.logs"='[{"source": "mapreduce", "service": "<SERVICE_NAME>"}]'

Validation

Run the Agent’s status subcommand and look for mapreduce under the Checks section.

Data Collected

Metrics

mapreduce.job.counter.map_counter_value
(rate)
Counter value of map tasks
Shown as task
mapreduce.job.counter.reduce_counter_value
(rate)
Counter value of reduce tasks
Shown as task
mapreduce.job.counter.total_counter_value
(rate)
Counter value of all tasks
Shown as task
mapreduce.job.elapsed_time.95percentile
(gauge)
95th percentile elapsed time since the application started
Shown as millisecond
mapreduce.job.elapsed_time.avg
(gauge)
Average elapsed time since the application started
Shown as millisecond
mapreduce.job.elapsed_time.count
(rate)
Number of times the elapsed time was sampled
mapreduce.job.elapsed_time.max
(gauge)
Max elapsed time since the application started
Shown as millisecond
mapreduce.job.elapsed_time.median
(gauge)
Median elapsed time since the application started
Shown as millisecond
mapreduce.job.failed_map_attempts
(rate)
Number of failed map attempts
Shown as task
mapreduce.job.failed_reduce_attempts
(rate)
Number of failed reduce attempts
Shown as task
mapreduce.job.killed_map_attempts
(rate)
Number of killed map attempts
Shown as task
mapreduce.job.killed_reduce_attempts
(rate)
Number of killed reduce attempts
Shown as task
mapreduce.job.map.task.elapsed_time.95percentile
(gauge)
95th percentile of all map tasks elapsed time
Shown as millisecond
mapreduce.job.map.task.elapsed_time.avg
(gauge)
Average of all map tasks elapsed time
Shown as millisecond
mapreduce.job.map.task.elapsed_time.count
(rate)
Number of times the map tasks elapsed time were sampled
mapreduce.job.map.task.elapsed_time.max
(gauge)
Max of all map tasks elapsed time
Shown as millisecond
mapreduce.job.map.task.elapsed_time.median
(gauge)
Median of all map tasks elapsed time
Shown as millisecond
mapreduce.job.maps_completed
(rate)
Number of completed maps
Shown as task
mapreduce.job.maps_pending
(rate)
Number of pending maps
Shown as task
mapreduce.job.maps_running
(rate)
Number of running maps
Shown as task
mapreduce.job.maps_total
(rate)
Total number of maps
Shown as task
mapreduce.job.new_map_attempts
(rate)
Number of new map attempts
Shown as task
mapreduce.job.new_reduce_attempts
(rate)
Number of new reduce attempts
Shown as task
mapreduce.job.reduce.task.elapsed_time.95percentile
(gauge)
95th percentile of all reduce tasks elapsed time
Shown as millisecond
mapreduce.job.reduce.task.elapsed_time.avg
(gauge)
Average of all reduce tasks elapsed time
Shown as millisecond
mapreduce.job.reduce.task.elapsed_time.count
(rate)
Number of times the reduce tasks elapsed time were sampled
mapreduce.job.reduce.task.elapsed_time.max
(gauge)
Max of all reduce tasks elapsed time
Shown as millisecond
mapreduce.job.reduce.task.elapsed_time.median
(gauge)
Median of all reduce tasks elapsed time
Shown as millisecond
mapreduce.job.reduces_completed
(rate)
Number of completed reduces
Shown as task
mapreduce.job.reduces_pending
(rate)
Number of pending reduces
Shown as task
mapreduce.job.reduces_running
(rate)
Number of running reduces
Shown as task
mapreduce.job.reduces_total
(rate)
Number of reduces
Shown as task
mapreduce.job.running_map_attempts
(rate)
Number of running map attempts
Shown as task
mapreduce.job.running_reduce_attempts
(rate)
Number of running reduce attempts
Shown as task
mapreduce.job.successful_map_attempts
(rate)
Number of successful map attempts
Shown as task
mapreduce.job.successful_reduce_attempts
(rate)
Number of successful reduce attempts
Shown as task

Events

The Mapreduce check does not include any events.

Service Checks

mapreduce.resource_manager.can_connect
Returns CRITICAL if the Agent is unable to connect to the Resource Manager. Returns OK otherwise.
Statuses: ok, critical

mapreduce.application_master.can_connect
Returns CRITICAL if the Agent is unable to connect to the Application Master. Returns OK otherwise.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.

Further Reading

PREVIEWING: esther/docs-9518-update-example-control-sensitive-log-data