Datadog Kubernetes Autoscaling

Join the Preview!

Datadog Kubernetes Autoscaling is in Preview.

Datadog Kubernetes Autoscaling automates the scaling of your Kubernetes environments based on utilization metrics. This feature enables you to make changes to your Kubernetes environments from within Datadog.

How it works

Datadog Kubernetes Autoscaling provides cluster scaling observability and workload scaling recommendations and automation. Datadog uses real-time and historical utilization metrics to make recommendations. With data from Cloud Cost Management, Datadog can also make recommendations based on costs.

Automated workload scaling is powered by a DatadogPodAutoscaler custom resource that defines scaling behavior on a per-workload level.

Each cluster can have a maximum of 100 workloads optimized with Datadog Kubernetes Autoscaler.

During the Preview period, Preview users are granted access to Cloud Cost Management. For details, see Ingest cost data with Cloud Cost Management

Compatibility

  • Distributions: This feature is compatible with all of Datadog’s supported Kubernetes distributions.
  • Cluster autoscaling: This feature works alongside cluster autoscaling solutions, such as Karpenter and Cluster Autoscaler.
  • Workload autoscaling: This feature is an alternative to Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA). Datadog recommends that you remove any HPAs or VPAs from a workload before you use Datadog Kubernetes Autoscaling to optimize it.

Requirements

  • Remote Configuration must be enabled for your organization. See Enabling Remote Configuration.
  • Helm, for updating your Datadog Agent
  • (For Datadog Operator users) kubectl CLI, for updating the Datadog Agent
  • The following user permissions:
    • Org Management (org_management)
    • API Keys Write (api_keys_write)
    • Workload Scaling Write (orchestration_workload_scaling_write)
During the Preview period, Preview users are granted access at an organization level...

Setup

  1. Ensure you are using Datadog Operator v1.8.0+. To upgrade your Datadog Operator:
helm upgrade datadog-operator datadog/datadog-operator 
  1. Add the following to your datadog-agent.yaml configuration file:
spec:
  features:
    orchestratorExplorer:
      customResources:
      - datadoghq.com/v1alpha1/datadogpodautoscalers
    autoscaling:
      workload:
        enabled: true
    eventCollection:
      unbundleEvents: true
  override:
    clusterAgent:
      image:
        tag: 7.58.1
    nodeAgent:
      image:
        tag: 7.58.1 # or 7.58.1-jmx
    clusterChecksRunner
      image:
        tag: 7.58.1 # or 7.58.1-jmx
  1. Admission Controller is enabled by default with the Datadog Operator. If you disabled it, add the following highlighted lines to datadog-agent.yaml:
...
spec:
  features:
    admissionController:
      enabled: true
...
  1. Apply the updated datadog-agent.yaml configuration:
kubectl apply -n $DD_NAMESPACE -f datadog-agent.yaml
  1. Add the following to your datadog-values.yaml configuration file:
datadog:
  orchestratorExplorer:
    customResources:
    - datadoghq.com/v1alpha1/datadogpodautoscalers
  autoscaling:
    workload:
      enabled: true
  kubernetesEvents:
    unbundleEvents: true
clusterAgent:
  image:
    tag: 7.58.1
agents:
  image:
    tag: 7.58.1 # or 7.58.1-jmx
clusterChecksRunner:
  image:
    tag: 7.58.1 # or 7.58.1-jmx
  1. Admission Controller is enabled by default in the Datadog Helm chart. If you disabled it, add the following highlighted lines to datadog-values.yaml:
...
clusterAgent:
  image:
    tag: 7.58.1
  admissionController:
    enabled: true
...
  1. Update your Helm version:
helm repo update
  1. Redeploy the Datadog Agent with your updated datadog-values.yaml:
helm upgrade -f datadog-values.yaml <RELEASE_NAME> datadog/datadog

Ingest cost data with Cloud Cost Management

Datadog’s Kubernetes Autoscaling can work with Cloud Cost Management to make workload scaling recommendations based on cost data…

Kubernetes Autoscaling Preview users are granted limited access to Cloud Cost Management during the Preview period. To coordinate this trial access, contact your customer success manager and CC kubernetes-beta@datadoghq.com.

If you are already using Cloud Cost Management, no action is required.

See Cloud Cost setup instructions for AWS, Azure, or Google Cloud.

If you do not enable Cloud Cost Management, all workload recommendations and autoscaling decisions are still valid and functional.

Usage

Identifying resources to scale

In Datadog, navigate to Containers > Kubernetes Explorer and select the Autoscaling tab. Use the Cluster Scaling view to see a list of your clusters, sortable by total idle CPU or total idle memory. If you enabled Cloud Cost Management, you can also see cost information and a trailing 30-day cost breakdown.

In Datadog, Infrastructure > Containers > Kubernetes Explorer > Autoscaling > Cluster Scaling. A table of clusters, displaying each cluster's idle CPU, idle memory, and costs. Each cluster has an 'Optimize Cluster' option.

Click Optimize cluster to open a detailed view of the selected cluster, including a table of this cluster’s workloads.

A detailed cluster view. At the top, widgets displaying cost metrics and scaling events. Below, a table of this cluster's workloads, displaying each deployment's idle CPU, idle memory, and costs. Each workload has an 'Optimize' option.

You can also use the Workload Scaling view to see a filterable list of all workloads across all clusters.

Select a workload and click Optimize to see its Scaling Recommendations. You can inspect the metrics backing the recommendation for each container within the deployment.

A side panel is opened over the detailed cluster view. A section titled 'Scaling Recommendation' displays the text 'Set Memory on 2 containers and decrease replicas to 5'.

Deploying recommendations

You can deploy scaling recommendations:

  • automatically, with Datadog Kubernetes Autoscaling.

    Select Enable Autoscaling to automatically apply your recommendations.

  • manually, with kubectl patch.

    Select Apply to see a generated kubectl patch command.

Autoscale a workload with a custom resource

You can also deploy a DatadogPodAutoscaler custom resource to enable autoscaling for a workload. This custom resource targets a deployment.

For example:

apiVersion: datadoghq.com/v1alpha1
kind: DatadogPodAutoscaler
metadata:
  name: <name>
    # usually the same as your deployment object name
spec:
  constraints:
    # Adjust constraints as safeguards
    maxReplicas: 50
    minReplicas: 1 
  
  owner: Local
  policy: All
    # Values: All, None
    #   All - Allows automated recommendations to be applied. Default.
    #   None - Computes recommendations without applying them (dry run).

  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: <your Deployment name>
  targets:
    # Currently, recommendation is to use a single target with CPU Utilization of main container of the POD.
    - type: ContainerResource
      containerResource:
        container: <main-container-name>
        name: cpu
        value:
          type: Utilization
          utilization: 75

Further reading

PREVIEWING: cswatt/DOCS_10103_container_autoscaling