Kubernetes Autoscaling

Datadog Kubernetes Autoscaling continuously monitors your Kubernetes resources to provide immediate scaling recommendations and multidimensional autoscaling of your Kubernetes workloads. You can deploy autoscaling through the Datadog web interface, or with a DatadogPodAutoscaler custom resource.

How it works

Datadog uses real-time and historical utilization metrics and event signals from your existing Datadog Agents to make recommendations. You can then examine these recommendations and choose to deploy them.

By default, Datadog Kubernetes Autoscaling uses estimated CPU and memory cost values to show savings opportunities and impact estimates. You can also use Kubernetes Autoscaling alongside Cloud Cost Management to get reporting based on your exact instance type costs.

Automated workload scaling is powered by a DatadogPodAutoscaler custom resource that defines scaling behavior on a per-workload level. The Datadog Cluster Agent acts as the controller for this custom resource.

Each cluster can have a maximum of 1000 workloads optimized with Datadog Kubernetes Autoscaler.

Compatibility

  • Distributions: This feature is compatible with all of Datadog’s supported Kubernetes distributions.
  • Workload autoscaling: This feature is an alternative to Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA). Datadog recommends that you remove any HPAs or VPAs from a workload before you use Datadog Kubernetes Autoscaling to optimize it. These workloads are identified in the application on your behalf.

Requirements

  • Remote Configuration must be enabled for your organization. See Enabling Remote Configuration.
  • Helm, for updating your Datadog Agent.
  • (For Datadog Operator users) kubectl CLI, for updating the Datadog Agent.
  • The following user permissions:
    • Org Management (required for Remote Configuration)
    • API Keys Write (required for Remote Configuration)
    • Workload Scaling Read
    • Workload Scaling Write
    • Autoscaling Manage

Setup

  1. Ensure you are using Datadog Operator v1.8.0+. To upgrade your Datadog Operator:
helm upgrade datadog-operator datadog/datadog-operator 
  1. Add the following to your datadog-agent.yaml configuration file:
spec:
  features:
    orchestratorExplorer:
      customResources:
      - datadoghq.com/v1alpha1/datadogpodautoscalers
    autoscaling:
      workload:
        enabled: true
    eventCollection:
      unbundleEvents: true
  override:
    clusterAgent:
      image:
        tag: 7.58.1
    nodeAgent:
      image:
        tag: 7.58.1 # or 7.58.1-jmx
    clusterChecksRunner
      image:
        tag: 7.58.1 # or 7.58.1-jmx
  1. Admission Controller is enabled by default with the Datadog Operator. If you disabled it, re-enable it by adding the following highlighted lines to datadog-agent.yaml:
...
spec:
  features:
    admissionController:
      enabled: true
...
  1. Apply the updated datadog-agent.yaml configuration:
kubectl apply -n $DD_NAMESPACE -f datadog-agent.yaml
  1. Add the following to your datadog-values.yaml configuration file:
datadog:
  orchestratorExplorer:
    customResources:
    - datadoghq.com/v1alpha1/datadogpodautoscalers
  autoscaling:
    workload:
      enabled: true
  kubernetesEvents:
    unbundleEvents: true
clusterAgent:
  image:
    tag: 7.58.1
agents:
  image:
    tag: 7.58.1 # or 7.58.1-jmx
clusterChecksRunner:
  image:
    tag: 7.58.1 # or 7.58.1-jmx
  1. Admission Controller is enabled by default in the Datadog Helm chart. If you disabled it, re-enable it by adding the following highlighted lines to datadog-values.yaml:
...
clusterAgent:
  image:
    tag: 7.58.1
  admissionController:
    enabled: true
...
  1. Update your Helm version:
helm repo update
  1. Redeploy the Datadog Agent with your updated datadog-values.yaml:
helm upgrade -f datadog-values.yaml <RELEASE_NAME> datadog/datadog

Ingest cost data with Cloud Cost Management

By default, Datadog Kubernetes Autoscaling shows idle cost and savings estimates using the following fixed values:

  • $0.0295 per CPU core hour
  • $0.0053 per memory GB hour

Fixed cost values are subject to refinement over time.

When Cloud Cost Management is enabled within an org, Datadog Kubernetes Autoscaling shows idle cost and savings estimates based on your exact bill cost of underlying monitored instances.

See Cloud Cost setup instructions for AWS, Azure, or Google Cloud.

Cost data enhances Kubernetes Autoscaling, but it is not required. All of Datadog’s workload recommendations and autoscaling decisions are valid and functional without cost data.

Usage

Identify resources to rightsize

The Autoscaling Summary page provides a starting point for platform teams to understand the total Kubernetes Resource savings opportunities across an organization, and filter down to key clusters and namespaces. The Cluster Scaling view provides per-cluster information about total idle CPU, total idle memory, and costs. Click on a cluster for detailed information and a table of the cluster’s workloads. If you are an individual application or service owner, you can also filter by your team or service name directly from the Workload Scaling list view.

Click Optimize on any workload to see its scaling recommendation.

Deploy Autoscaling

After you identify a workload to optimize, Datadog recommends inspecting its Scaling Recommendation. You can also click Configure Recommendation to add constraints or adjust target utilization levels.

When you are ready to proceed with enabling Autoscaling for a workload, you have two options for deployment:

  • Click Enable Autoscaling. (Requires Workload Scaling Write permission.)

    Datadog automatically installs and configures autoscaling for this workload on your behalf.

  • Deploy a DatadogPodAutoscaler custom resource.

    Use your existing deploy process to target and configure Autoscaling for your workload. Click Export Recommendation to see a suggested manifest configuration.

Deploy recommendations manually

As an alternative to Autoscaling, you can also deploy Datadog’s scaling recommendations manually. When you configure resources for your Kubernetes deployments, use the values suggested in the scaling recommendations. You can also click Export Recommendation to see a generated kubectl patch command.

Further reading

PREVIEWING: Danny-Driscoll/update-section-title