OpenShift

Supported OS Linux

Overview

Red Hat OpenShift is an open source container application platform based on the Kubernetes container orchestrator for enterprise application development and deployment.

This README describes the necessary configuration to enable collection of OpenShift-specific metrics in the Agent. Data described here are collected by the kubernetes_apiserver check. You must configure the check to collect the openshift.* metrics.

Setup

Installation

This core configuration supports OpenShift 3.11 and OpenShift 4, but it works best with OpenShift 4.

To install the Agent, see the Agent installation instructions for general Kubernetes instructions and the Kubernetes Distributions page for OpenShift configuration examples.

Alternatively, the Datadog Operator can be used to install and manage the Datadog Agent. The Datadog Operator can be installed using OpenShift’s OperatorHub.

Security Context Constraints configuration

If you are deploying the Datadog Agent using any of the methods linked in the installation instructions above, you must include Security Context Constraints (SCCs) for the Agent and Cluster Agent to collect data. Follow the instructions below as they relate to your deployment.

For instructions on how to install the Datadog Operator and DatadogAgent resource in OpenShift, see the OpenShift installation guide.

If you deploy the Operator with Operator Lifecycle Manager (OLM), then the necessary default SCCs present in OpenShift are automatically associated with the datadog-agent-scc Service Account. You can then deploy the Datadog components with the DatadogAgent CustomResourceDefinition, referencing this Service Account on the Node Agent and Cluster Agent pods.

See the Distributions page and the Operator repo for more examples.

You can create the SCC directly within your Datadog Agent’s values.yaml. Add the following block parameters under the agents and clusterAgent section to create their respective SCCs.

datadog:
  #(...)

agents:
  podSecurity:
    securityContextConstraints:
      create: true

clusterAgent:
  podSecurity:
    securityContextConstraints:
      create: true

You can apply this when you initially deploy the Agent, or you can execute a helm upgrade after making this change to apply the SCC.

See the Distributions page and the Helm repo for more examples.

Depending on your needs and the security constraints of your cluster, three deployment scenarios are supported:

Security Context ConstraintsRestrictedHost networkCustom
Kubernetes layer monitoringSupportedSupportedSupported
Kubernetes-based AutodiscoverySupportedSupportedSupported
Dogstatsd intakeNot supportedSupportedSupported
APM trace intakeNot supportedSupportedSupported
Logs network intakeNot supportedSupportedSupported
Host network metricsNot supportedSupportedSupported
Docker layer monitoringNot supportedNot supportedSupported
Container logs collectionNot supportedNot supportedSupported
Live Container monitoringNot supportedNot supportedSupported
Live Process monitoringNot supportedNot supportedSupported

Restricted SCC operations

This mode does not require granting special permissions to the datadog-agent DaemonSet, other than the RBAC permissions needed to access the kubelet and the APIserver. You can get started with this kubelet-only template.

The recommended ingestion method for Dogstatsd, APM, and logs is to bind the Datadog Agent to a host port. This way, the target IP is constant and easily discoverable by your applications. The default restricted OpenShift SCC does not allow binding to the host port. You can set the Agent to listen on its own IP, but you need to handle the discovery of that IP from your application.

The Agent supports working on a sidecar run mode, to enable running the Agent in your application’s pod for easier discoverability.

Host

Add the allowHostPorts permission to the pod with the standard hostnetwork or hostaccess SCC, or by creating your own. In this case, you can add the relevant port bindings in your pod specs:

ports:
  - containerPort: 8125
    name: dogstatsdport
    protocol: UDP
  - containerPort: 8126
    name: traceport
    protocol: TCP

Custom Datadog SCC for all features

The Helm Chart and Datadog Operator manage the SCC for you by default. To manage it yourself instead, make sure to include the correct configurations based on the features you have enabled.

If SELinux is in permissive mode or disabled, enable the hostaccess SCC to benefit from all features. If SELinux is in enforcing mode, it is recommended to grant the spc_t type to the datadog-agent pod. In order to deploy the agent you can use the following datadog-agent SCC that can be applied after creating the datadog-agent service account. It grants the following permissions:

  • allowHostPorts: true: Binds Dogstatsd / APM / Logs intakes to the node’s IP.
  • allowHostPID: true: Enables Origin Detection for Dogstatsd metrics submitted by Unix Socket.
  • volumes: hostPath: Accesses the Docker socket and the host’s proc and cgroup folders, for metric collection.
  • SELinux type: spc_t: Accesses the Docker socket and all processes’ proc and cgroup folders, for metric collection. See Introducing a Super Privileged Container Concept for more details.
Do not forget to add a datadog-agent service account to the newly created datadog-agent SCC by adding system:serviceaccount:: to the users section.
OpenShift 4.0+: If you used the OpenShift installer on a supported cloud provider, you must deploy the SCC with allowHostNetwork: true in the scc.yaml manifest, as well as hostNetwork: true in the Agent configuration to get host tags and aliases. Access to metadata servers from the Pod network is otherwise restricted.

Note: The Docker socket is owned by the root group, so you may need to elevate the Agent’s privileges to pull in Docker metrics. To run the Agent process as a root user, you can configure your SCC with the following:

runAsUser:
  type: RunAsAny

Log collection

The Datadog Agent’s log collection is set up in OpenShift largely the same as other Kubernetes clusters. The Datadog Operator and Helm Chart mount in the /var/log/pods directory, which the Datadog Agent pod uses to monitor the logs of the pods and containers on its respective host. However, with the Datadog Operator, you need to apply additional SELinux options to give the Agent permissions to read these log files.

See Kubernetes Log Collection for further general information and the Distributions page for configuration examples.

APM

In Kubernetes, there are three main options to route the data from the application pod to the Datadog Agent pod: the Unix Domain Socket (UDS), the HostIP:HostPort option (TCP/IP), and the Kubernetes Service. The Datadog Operator and Helm Chart default to the UDS option as this is the most resource efficient. However, this option doesn’t work well in OpenShift, as it requires elevated SCC and SELinux options in both the Agent pod and application pod.

Datadog recommends disabling the UDS option explicitly to avoid this, and to avoid the Admission Controller injecting this configuration.

See Kubernetes APM - Trace Collection for further general information and the Distributions page for configuration examples.

Validation

See kubernetes_apiserver

Data Collected

Metrics

openshift.appliedclusterquota.cpu.limit
(gauge)
Hard limit for cpu by cluster resource quota and namespace
Shown as cpu
openshift.appliedclusterquota.cpu.remaining
(gauge)
Remaining available cpu by cluster resource quota and namespace
Shown as cpu
openshift.appliedclusterquota.cpu.used
(gauge)
Observed cpu usage by cluster resource quota and namespace
Shown as cpu
openshift.appliedclusterquota.memory.limit
(gauge)
Hard limit for memory by cluster resource quota and namespace
Shown as byte
openshift.appliedclusterquota.memory.remaining
(gauge)
Remaining available memory by cluster resource quota and namespace
Shown as byte
openshift.appliedclusterquota.memory.used
(gauge)
Observed memory usage by cluster resource quota and namespace
Shown as byte
openshift.appliedclusterquota.persistentvolumeclaims.limit
(gauge)
Hard limit for persistent volume claims by cluster resource quota and namespace
openshift.appliedclusterquota.persistentvolumeclaims.remaining
(gauge)
Remaining available persistent volume claims by cluster resource quota and namespace
openshift.appliedclusterquota.persistentvolumeclaims.used
(gauge)
Observed persistent volume claims usage by cluster resource quota and namespace
openshift.appliedclusterquota.pods.limit
(gauge)
Hard limit for pods by cluster resource quota and namespace
openshift.appliedclusterquota.pods.remaining
(gauge)
Remaining available pods by cluster resource quota and namespace
openshift.appliedclusterquota.pods.used
(gauge)
Observed pods usage by cluster resource quota and namespace
openshift.appliedclusterquota.services.limit
(gauge)
Hard limit for services by cluster resource quota and namespace
openshift.appliedclusterquota.services.loadbalancers.limit
(gauge)
Hard limit for service load balancers by cluster resource quota and namespace
openshift.appliedclusterquota.services.loadbalancers.remaining
(gauge)
Remaining available service load balancers by cluster resource quota and namespace
openshift.appliedclusterquota.services.loadbalancers.used
(gauge)
Observed service load balancers usage by cluster resource quota and namespace
openshift.appliedclusterquota.services.nodeports.limit
(gauge)
Hard limit for service node ports by cluster resource quota and namespace
openshift.appliedclusterquota.services.nodeports.remaining
(gauge)
Remaining available service node ports by cluster resource quota and namespace
openshift.appliedclusterquota.services.nodeports.used
(gauge)
Observed service node ports usage by cluster resource quota and namespace
openshift.appliedclusterquota.services.remaining
(gauge)
Remaining available services by cluster resource quota and namespace
openshift.appliedclusterquota.services.used
(gauge)
Observed services usage by cluster resource quota and namespace
openshift.clusterquota.cpu.limit
(gauge)
Hard limit for cpu by cluster resource quota for all namespaces
Shown as cpu
openshift.clusterquota.cpu.remaining
(gauge)
Remaining available cpu by cluster resource quota for all namespaces
Shown as cpu
openshift.clusterquota.cpu.requests.used
(gauge)
Observed cpu usage by cluster resource for request
openshift.clusterquota.cpu.used
(gauge)
Observed cpu usage by cluster resource quota for all namespaces
Shown as cpu
openshift.clusterquota.memory.limit
(gauge)
Hard limit for memory by cluster resource quota for all namespaces
Shown as byte
openshift.clusterquota.memory.remaining
(gauge)
Remaining available memory by cluster resource quota for all namespaces
Shown as byte
openshift.clusterquota.memory.used
(gauge)
Observed memory usage by cluster resource quota for all namespaces
Shown as byte
openshift.clusterquota.persistentvolumeclaims.limit
(gauge)
Hard limit for persistent volume claims by cluster resource quota for all namespaces
openshift.clusterquota.persistentvolumeclaims.remaining
(gauge)
Remaining available persistent volume claims by cluster resource quota for all namespaces
openshift.clusterquota.persistentvolumeclaims.used
(gauge)
Observed persistent volume claims usage by cluster resource quota for all namespaces
openshift.clusterquota.pods.limit
(gauge)
Hard limit for pods by cluster resource quota for all namespaces
openshift.clusterquota.pods.remaining
(gauge)
Remaining available pods by cluster resource quota for all namespaces
openshift.clusterquota.pods.used
(gauge)
Observed pods usage by cluster resource quota for all namespaces
openshift.clusterquota.services.limit
(gauge)
Hard limit for services by cluster resource quota for all namespaces
openshift.clusterquota.services.loadbalancers.limit
(gauge)
Hard limit for service load balancers by cluster resource quota for all namespaces
openshift.clusterquota.services.loadbalancers.remaining
(gauge)
Remaining available service load balancers by cluster resource quota for all namespaces
openshift.clusterquota.services.loadbalancers.used
(gauge)
Observed service load balancers usage by cluster resource quota for all namespaces
openshift.clusterquota.services.nodeports.limit
(gauge)
Hard limit for service node ports by cluster resource quota for all namespaces
openshift.clusterquota.services.nodeports.remaining
(gauge)
Remaining available service node ports by cluster resource quota for all namespaces
openshift.clusterquota.services.nodeports.used
(gauge)
Observed service node ports usage by cluster resource quota for all namespaces
openshift.clusterquota.services.remaining
(gauge)
Remaining available services by cluster resource quota for all namespaces
openshift.clusterquota.services.used
(gauge)
Observed services usage by cluster resource quota for all namespaces

Events

The OpenShift check does not include any events.

Service Checks

The OpenShift check does not include any Service Checks.

Troubleshooting

Need help? Contact Datadog support.

PREVIEWING: rtrieu/product-analytics-ui-changes