This check monitors Systemd and the units it manages through the Datadog Agent.
Track the state and health of your Systemd
Monitor the units, services, sockets managed by Systemd
Setup
Installation
The Systemd check is included in the Datadog Agent package. No additional installation is needed on your server.
Configuration
Host
To configure this check for an Agent running on a host:
Edit the systemd.d/conf.yaml file, in the conf.d/ folder at the root of your
Agent’s configuration directory to start collecting your systemd performance data.
See the sample systemd.d/conf.yaml for all available configuration options.
For containerized environments, mount the /run/systemd/ folder, which contains the socket /run/systemd/private needed to retrieve the Systemd data, for example:
For Helm configurations, you can set up the Datadog Agent to monitor systemd units (such as: kubelet.service and ssh.service) by defining volume mounts and volumes for accessing systemd-related files and directories within containers. For example:
datadog:
#(...) confd:
# Custom config file for SystemD# Example: https://github.com/DataDog/datadog-agent/blob/main/cmd/agent/dist/conf.d/systemd.d/conf.yaml.example systemd.yaml: |-
init_config:
instances:
- unit_names:
- kubelet.service
- ssh.service
agents:
# Custom Mounts for SystemD socket (/run/systemd/private) volumeMounts:
- name: systemd
mountPath: /host/run/systemd/ # the path within the container where the volume will be mounted volumes:
- name: systemd
hostPath:
path: /run/systemd/ # the path on the host machine that will be mounted into the container.
The overall cpu consumed by the service in nanoseconds (CPUUsageNSec), requires Systemd configuration CPUAccounting to be enabled and Systemd version >= 220 Shown as nanosecond
systemd.service.memory_usage (gauge)
The memory currently used by the service in bytes (MemoryCurrent), requires Systemd configuration MemoryAccounting to be enabled Shown as byte
systemd.service.restart_count (gauge)
The number of times the service has been restarted due to Restart= (NRestarts), requires Systemd version >= 235 Shown as time
systemd.service.task_count (gauge)
The current number of tasks in the service (TasksCurrent), requires Systemd configuration TasksAccounting to be enabled Shown as task
systemd.socket.connection_accepted_count (gauge)
The number of accepted socket connections (NAccepted) Shown as connection
systemd.socket.connection_count (gauge)
The current number of socket connections (NConnections) Shown as connection
systemd.socket.connection_refused_count (gauge)
The total number of refused socket connections (NRefused), requires Systemd version >= 239 Shown as connection
systemd.unit.active (gauge)
Whether the unit is currently in active state
systemd.unit.loaded (gauge)
Whether the unit is currently in loaded state
systemd.unit.monitored (gauge)
Indicates that the unit is monitored (the value is always 1)
systemd.unit.uptime (gauge)
The unit uptime in seconds since it's activation Shown as second
systemd.units_by_state (gauge)
Sum by state to count units Shown as unit
systemd.units_loaded_count (gauge)
The number of loaded units Shown as unit
systemd.units_monitored_count (gauge)
The number of monitored units Shown as unit
systemd.units_total (gauge)
The total number of units Shown as unit
Some metrics are reported only if the respective configuration are enabled:
systemd.service.cpu_time_consumed requires Systemd configuration CPUAccounting to be enabled
systemd.service.memory_usage requires Systemd configuration MemoryAccounting to be enabled
systemd.service.task_count requires Systemd configuration TasksAccounting to be enabled
Some metrics are only available from specific version of Systemd:
systemd.can_connect Returns OK if Systemd is reachable, CRITICAL otherwise. Statuses: ok, critical
systemd.system.state Returns OK if Systemd’s system state is running. Returns CRITICAL if the state is degraded, maintenance, or stopping. Returns UNKNOWN if the state is initializing, starting, or other. Statuses: ok, critical, unknown
systemd.unit.state Returns OK if the unit active state is active. Returns CRITICAL if the state is inactive, deactivating, or failed. Returns UNKNOWN if the state is activating or other. Statuses: ok, critical, unknown
systemd.unit.substate Returns OKCRITICAL or UNKNOWN based on the substate of the unit and the user-provided mapping in systemd.d/conf.yaml. Statuses: ok, critical, unknown