Mesos Master

문서 > 통합 > Mesos Master

Supported OS Linux Mac OS

통합 버전5.1.0

이 페이지는 아직 영어로 제공되지 않습니다. 번역 작업 중입니다.
현재 번역 프로젝트에 대한 질문이나 피드백이 있으신 경우 언제든지 연락주시기 바랍니다.

This check collects metrics for Mesos masters. For Mesos slave metrics, see the Mesos Slave integration.

Mesos master Dashboard

Overview

This check collects metrics from Mesos masters for:

Cluster resources
Slaves registered, active, inactive, connected, disconnected, etc
Number of tasks failed, finished, staged, running, etc
Number of frameworks active, inactive, connected, and disconnected

And many more.

Setup

Installation

The installation is the same on Mesos with and without DC/OS. Run the datadog-agent container on each of your Mesos master nodes:

docker run -d --name datadog-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -e DD_API_KEY=<YOUR_DATADOG_API_KEY> \
  -e MESOS_MASTER=true \
  -e MARATHON_URL=http://leader.mesos:8080 \
  datadog/agent:latest

Substitute your Datadog API key and Mesos Master’s API URL into the command above.

Configuration

If you passed the correct Master URL when starting datadog-agent, the Agent is already using a default mesos_master.d/conf.yaml to collect metrics from your masters. See the sample mesos_master.d/conf.yaml for all available configuration options.

Unless your masters’ API uses a self-signed certificate. In that case, set disable_ssl_validation: true in mesos_master.d/conf.yaml.

Log collection

Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:
```
logs_enabled: true
```
Add this configuration block to your mesos_master.d/conf.yaml file to start collecting your Mesos logs:
```
logs:
  - type: file
    path: /var/log/mesos/*
    source: mesos
```
Change the path parameter value based on your environment, or use the default docker stdout:
```
logs:
  - type: docker
    source: mesos
```
See the sample mesos_master.d/conf.yaml for all available configuration options.
Restart the Agent.

To enable logs for Kubernetes environments, see Kubernetes Log Collection.

Validation

In Datadog, search for mesos.cluster in the Metrics Explorer.

Data Collected

Metrics


mesos.cluster.cpus_percent (gauge)	Percentage of allocated CPUs Shown as percent
mesos.cluster.cpus_total (gauge)	Number of CPUs
mesos.cluster.cpus_used (gauge)	Number of allocated CPUs
mesos.cluster.disk_percent (gauge)	Percentage of allocated disk space Shown as percent
mesos.cluster.disk_total (gauge)	Disk space Shown as mebibyte
mesos.cluster.disk_used (gauge)	Allocated disk space Shown as mebibyte
mesos.cluster.dropped_messages (gauge)	Number of dropped messages Shown as message
mesos.cluster.event_queue_dispatches (gauge)	Number of dispatches in the event queue
mesos.cluster.event_queue_http_requests (gauge)	Number of HTTP requests in the event queue Shown as request
mesos.cluster.event_queue_messages (gauge)	Number of messages in the event queue Shown as message
mesos.cluster.frameworks_active (gauge)	Number of active frameworks
mesos.cluster.frameworks_connected (gauge)	Number of connected frameworks
mesos.cluster.frameworks_disconnected (gauge)	Number of disconnected frameworks
mesos.cluster.frameworks_inactive (gauge)	Number of inactive frameworks
mesos.cluster.gpus_percent (gauge)	Percentage of allocated GPUs Shown as percent
mesos.cluster.gpus_total (gauge)	Number of GPUs
mesos.cluster.gpus_used (gauge)	Number of allocated GPUs
mesos.cluster.invalid_framework_to_executor_messages (gauge)	Number of invalid framework messages Shown as message
mesos.cluster.invalid_status_update_acknowledgements (gauge)	Number of invalid status update acknowledgements
mesos.cluster.invalid_status_updates (gauge)	Number of invalid status updates
mesos.cluster.mem_percent (gauge)	Percentage of allocated memory Shown as percent
mesos.cluster.mem_total (gauge)	Total memory Shown as mebibyte
mesos.cluster.mem_used (gauge)	Allocated memory Shown as mebibyte
mesos.cluster.outstanding_offers (gauge)	Number of outstanding resource offers
mesos.cluster.slave_registrations (gauge)	Number of slaves that were able to cleanly re-join the cluster and connect back to the master after the master is disconnected.
mesos.cluster.slave_removals (gauge)	Number of slaves removed for various reasons, including maintenance
mesos.cluster.slave_reregistrations (gauge)	Number of slave re-registrations
mesos.cluster.slave_shutdowns_canceled (gauge)	Number of cancelled slave shutdowns
mesos.cluster.slave_shutdowns_scheduled (gauge)	Number of slaves which have failed their health check and are scheduled to be removed
mesos.cluster.slaves_active (gauge)	Number of active slaves
mesos.cluster.slaves_connected (gauge)	Number of connected slaves
mesos.cluster.slaves_disconnected (gauge)	Number of disconnected slaves
mesos.cluster.slaves_inactive (gauge)	Number of inactive slaves
mesos.cluster.tasks_error (gauge)	Number of tasks that were invalid Shown as task
mesos.cluster.tasks_failed (count)	Number of failed tasks Shown as task
mesos.cluster.tasks_finished (count)	Number of finished tasks Shown as task
mesos.cluster.tasks_killed (count)	Number of killed tasks Shown as task
mesos.cluster.tasks_lost (count)	Number of lost tasks Shown as task
mesos.cluster.tasks_running (gauge)	Number of running tasks Shown as task
mesos.cluster.tasks_staging (gauge)	Number of staging tasks Shown as task
mesos.cluster.tasks_starting (gauge)	Number of starting tasks Shown as task
mesos.cluster.valid_framework_to_executor_messages (gauge)	Number of valid framework messages Shown as message
mesos.cluster.valid_status_update_acknowledgements (gauge)	Number of valid status update acknowledgements
mesos.cluster.valid_status_updates (gauge)	Number of valid status updates
mesos.framework.cpu (gauge)	Framework cpu
mesos.framework.disk (gauge)	Framework disk Shown as mebibyte
mesos.framework.mem (gauge)	Framework mem Shown as mebibyte
mesos.registrar.log.recovered (gauge)	Registrar log recovered
mesos.registrar.queued_operations (gauge)	Number of queued operations
mesos.registrar.registry_size_bytes (gauge)	Registry size Shown as byte
mesos.registrar.state_fetch_ms (gauge)	Registry read latency Shown as millisecond
mesos.registrar.state_store_ms (gauge)	Registry write latency Shown as millisecond
mesos.registrar.state_store_ms.count (gauge)	Registry write count
mesos.registrar.state_store_ms.max (gauge)	Maximum registry write latency Shown as millisecond
mesos.registrar.state_store_ms.min (gauge)	Minimum registry write latency Shown as millisecond
mesos.registrar.state_store_ms.p50 (gauge)	Median registry write latency Shown as millisecond
mesos.registrar.state_store_ms.p90 (gauge)	90th percentile registry write latency Shown as millisecond
mesos.registrar.state_store_ms.p95 (gauge)	95th percentile registry write latency Shown as millisecond
mesos.registrar.state_store_ms.p99 (gauge)	99th percentile registry write latency Shown as millisecond
mesos.registrar.state_store_ms.p999 (gauge)	99.9th percentile registry write latency Shown as millisecond
mesos.registrar.state_store_ms.p9999 (gauge)	99.99th percentile registry write latency Shown as millisecond
mesos.role.cpu (gauge)	Role cpu
mesos.role.disk (gauge)	Role disk Shown as mebibyte
mesos.role.mem (gauge)	Role mem Shown as mebibyte
mesos.stats.elected (gauge)	Whether this is the elected master
mesos.stats.registered (gauge)	Whether this slave is registered with a master
mesos.stats.system.cpus_total (gauge)	Number of CPUs available
mesos.stats.system.load_15min (gauge)	Load average for the past 15 minutes
mesos.stats.system.load_1min (gauge)	Load average for the past minutes
mesos.stats.system.load_5min (gauge)	Load average for the past 5 minutes
mesos.stats.system.mem_free_bytes (gauge)	Free memory Shown as byte
mesos.stats.system.mem_total_bytes (gauge)	Total memory Shown as byte
mesos.stats.uptime_secs (gauge)	Uptime Shown as second

Events

The Mesos-master check does not include any events.

Service Checks

mesos_master.can_connect

Returns CRITICAL if the Agent cannot connect to the Mesos Master API to collect metrics, UNKNOWN if the master is not detected as the leader, otherwise OK.

Statuses: ok, critical, unknown

Troubleshooting

Need help? Contact Datadog support.