Collect resource usage metrics for specific running processes on any host. For example, CPU, memory, I/O, and number of threads.
Use Process Monitors to configure thresholds for how many instances of a specific process should be running and get alerts when the thresholds aren’t met (see Service Checks below).
Unlike many checks, the Process check doesn’t monitor anything useful by default. You must configure which processes you want to monitor.
While there’s no standard default check configuration, here’s an example process.d/conf.yaml that monitors SSH/SSHD processes. See the sample process.d/conf.yaml for all available configuration options:
Note: After you make configuration changes, make sure you restart the Agent.
Retrieving some process metrics requires the Datadog collector to either run as the monitored process user or with privileged access. For the open_file_descriptors metric on Unix platforms, there is an additional configuration option. Setting try_sudo to true in your conf.yaml file allows the Process check to try using sudo to collect the open_file_descriptors metric. Using this configuration option requires setting the appropriate sudoers rules in /etc/sudoers:
The following metrics are not available on Linux or macOS:
Process I/O metrics are not available on Linux or macOS since the files that the Agent reads (/proc//io) are only readable by the process’s owner. For more information, read the Agent FAQ.
The following metrics are not available on Windows:
Note: Use a WMI check to gather page fault metrics on Windows.
Note: In v6.11+ on Windows, the Agent runs as ddagentuser instead of Local System. Because of this, it does not have access to the full command line of processes running under other users and to the user of other users’ processes. This causes the following options of the check to not work:
exact_match when set to false
user, which allows selecting processes that belong to a specific user
All metrics are per instance configured in process.yaml, and are tagged process_name:<instance_name>.
The system.processes.cpu.pct metric sent by this check is only accurate for processes that live for more
than 30 seconds. Do not expect its value to be accurate for shorter-lived processes.
process.up Returns OK if the check is within the warning thresholds, CRITICAL if it’s outside of the critical thresholds, and WARNING if it’s outside of the warning thresholds. Statuses: ok, warning, critical