Container Cost Allocation

Overview

Datadog Cloud Cost Management (CCM) automatically allocates the costs of your cloud clusters to individual services and workloads running in those clusters. Use cost metrics enriched with tags from pods, nodes, containers, and tasks to visualize container workload cost in the context of your entire cloud bill.

Clouds
CCM allocates costs of your AWS, Azure, or Google host instances. A host is a computer (such as an EC2 instance in AWS, a virtual machine in Azure, or a Compute Engine instance in Google Cloud) that is listed in your cloud provider’s cost and usage report and may be running Kubernetes pods.
Resources
CCM allocates costs for Kubernetes clusters and includes cost analysis for many associated resources such as Kubernetes persistent volumes used by your pods.

CCM displays costs for resources including CPU, memory, and more depending on the cloud and orchestrator you are using on the Containers page.

Cloud cost allocation table showing requests and idle costs over the past month on the Containers page

Prerequisites

CCM allocates costs of AWS ECS clusters as well as all Kubernetes clusters, including those managed through Elastic Kubernetes Service (EKS).

The following table presents the list of collected features and the minimal Agent and Cluster Agent versions for each.

FeatureMinimal Agent versionMinimal Cluster Agent version
Container Cost Allocation7.27.01.11.0
GPU Container Cost Allocation7.54.07.54.0
AWS Persistent Volume Allocation7.46.01.11.0
Data Transfer Cost Allocation7.55.07.55.0
  1. Configure the AWS Cloud Cost Management integration on the Cloud Costs Setup page.
  2. For Kubernetes support, install the Datadog Agent in a Kubernetes environment and ensure that you enable the Orchestrator Explorer in your Agent configuration.
  3. For AWS ECS support, set up Datadog Container Monitoring in ECS tasks.
  4. Optionally, enable AWS Split Cost Allocation for usage-based ECS allocation.
  5. To enable GPU container cost allocation, install the Datadog DCGM integration.
  6. To enable Data transfer cost allocation, set up Network Performance Monitoring. Note: additional charges apply

CCM allocates costs of all Kubernetes clusters, including those managed through Azure Kubernetes Service (AKS).

The following table presents the list of collected features and the minimal Agent and Cluster Agent versions for each.

FeatureMinimal Agent versionMinimal Cluster Agent version
Container Cost Allocation7.27.01.11.0
GPU Container Cost Allocation7.54.07.54.0
  1. Configure the Azure Cost Management integration on the Cloud Costs Setup page.
  2. Install the Datadog Agent in a Kubernetes environment and ensure that you enable the Orchestrator Explorer in your Agent configuration.
  3. To enable GPU container cost allocation, install the Datadog DCGM integration.

CCM allocates costs of all Kubernetes clusters, including those managed through Google Kubernetes Engine (GKE).

The following table presents the list of collected features and the minimal Agent and Cluster Agent versions for each.

FeatureMinimal Agent versionMinimal Cluster Agent version
Container Cost Allocation7.27.01.11.0
GPU Container Cost Allocation7.54.07.54.0
  1. Configure the Google Cloud Cost Management integration on the Cloud Costs Setup page.
  2. Install the Datadog Agent in a Kubernetes environment and ensure that you enable the Orchestrator Explorer in your Agent configuration.
  3. To enable GPU container cost allocation, install the Datadog DCGM integration.

Allocate costs

Cost allocation divides host compute and other resource costs from your cloud provider into individual tasks or pods associated with them. These divided costs are then enriched with tags from related resources so you can break down costs by any associated dimensions.

Use the allocated_resource tag to visualize the spend resource associated with your costs at various levels, including the Kubernetes node, container orchestration host, storage volume, or entire cluster level.

These divided costs are enriched with tags from nodes, pods, tasks, and volumes. You can use these tags to break down costs by any associated dimensions.

Compute

For Kubernetes compute allocation, a Kubernetes node is joined with its associated host instance costs. The node’s cluster name and all node tags are added to the entire compute cost for the node. This allows you to associate cluster-level dimensions with the cost of the instance, without considering the pods scheduled to the node.

Next, Datadog looks at all of the pods running on that node for the day. The cost of the node is allocated to the pod based on the resources it has used and the length of time it ran. This calculated cost is enriched with all of the pod’s tags.

Note: Only tags from pods and nodes are added to cost metrics. To include labels, enable labels as tags for nodes and pods.

All other costs are given the same value and tags as the source metric aws.cost.amortized.

Persistent volume storage

For Kubernetes Persistent Volume storage allocation, Persistent Volumes (PV), Persistent Volume Claims (PVC), nodes, and pods are joined with their associated EBS volume costs. All associated PV, PVC, node, and pod tags are added to the EBS volume cost line items.

Next, Datadog looks at all of the pods that claimed the volume on that day. The cost of the volume is allocated to a pod based on the resources it used and the length of time it ran. These resources include the provisioned capacity for storage, IOPS, and throughput. This allocated cost is enriched with all of the pod’s tags.

AWS ECS on EC2

For ECS allocation, Datadog determines which tasks ran on each EC2 instance used for ECS. If you enable AWS Split Cost Allocation, the metrics allocate ECS costs by usage instead of reservation, providing more granular detail.

Based on resources the task has used, Datadog assigns the appropriate portion of the instance’s compute cost to that task. The calculated cost is enriched with all of the task’s tags and all of the container tags (except container names) running in the task.

AWS ECS on Fargate

ECS tasks that run on Fargate are already fully allocated in the CUR. CCM enriches that data by adding out-of-the-box tags and container tags to the AWS Fargate cost.

Data transfer

For Kubernetes data transfer allocation, a Kubernetes node is joined with its associated data transfer costs from the CUR. The node’s cluster name and all node tags are added to the entire data transfer cost for the node. This allows you to associate cluster-level dimensions with the cost of the data transfer, without considering the pods scheduled to the node.

Datadog supports data transfer cost allocation only through the standard 6 workload resources. If you use custom workload resources their data transfer costs may only be allocated down to the cluster level and not the node/namespace level.

Compute

For Kubernetes compute allocation, a Kubernetes node is joined with its associated host instance costs. The node’s cluster name and all node tags are added to the entire compute cost for the node. This allows you to associate cluster-level dimensions with the cost of the instance, without considering the pods scheduled to the node.

Next, Datadog looks at all of the pods running on that node for the day. The cost of the node is allocated to the pod based on the resources it has used and the length of time it ran. This calculated cost is enriched with all of the pod’s tags.

Note: Only tags from pods and nodes are added to cost metrics. To include labels, enable labels as tags for nodes and pods.

All other costs are given the same value and tags as the source metric azure.cost.amortized.

Compute

For Kubernetes compute allocation, a Kubernetes node is joined with its associated host instance costs. The node’s cluster name and all node tags are added to the entire compute cost for the node. This allows you to associate cluster-level dimensions with the cost of the instance, without considering the pods scheduled to the node.

Next, Datadog looks at all of the pods running on that node for the day. The cost of the node is allocated to the pod based on the resources it has used and the length of time it ran. This calculated cost is enriched with all of the pod’s tags.

Note: Only tags from pods and nodes are added to cost metrics. To include labels, enable labels as tags for nodes and pods.

All other costs are given the same value and tags as the source metric gcp.cost.amortized.

Agentless Kubernetes costs

To view the costs of GKE clusters without enabling Datadog Infrastructure Monitoring, use GKE cost allocation. Enable GKE cost allocation on unmonitored GKE clusters to access this feature set.

Limitations and differences from the Datadog Agent

  • There is no support for tracking workload idle costs.
  • The cost of individual pods are not tracked, only the aggregated cost of a workload and the namespace. There is no pod_name tag.
  • GKE enriches data using pod labels only and ignores any Datadog tags you add.
  • The full list of limitations can be found in the official GKE documentation.

To enable GKE cost allocation, see the official GKE documentation.

Understanding spend

Use the allocated_spend_type tag to visualize the spend category associated with your costs at various levels, including the Kubernetes node, container orchestration host, storage volume, or entire cluster level.

Compute

The cost of a host instance is split into two components: 60% for the CPU and 40% for the memory. If the host instance has GPUs, the cost is split into three components: 95% for the GPU, 3% for the CPU, and 2% for the memory. Each component is allocated to individual workloads based on their resource reservations and usage.

Costs are allocated into the following spend types:

Spend typeDescription
UsageCost of resources (such as memory, CPU, and GPU) used by workloads, based on the average usage on that day.
Workload idleCost of resources (such as memory, CPU, and GPU) that are reserved and allocated but not used by workloads. This is the difference between the total resources requested and the average usage.
Cluster idleCost of resources (such as memory, CPU, and GPU) that are not reserved by workloads in a cluster. This is the difference between the total cost of the resources and what is allocated to workloads.

Persistent volume

The cost of an EBS volume has three components: IOPS, throughput, and storage. Each is allocated according to a pod’s usage when the volume is mounted.

Spend typeDescription
UsageCost of provisioned IOPS, throughput, or storage used by workloads. Storage cost is based on the maximum amount of volume storage used that day, while IOPS and throughput costs are based on the average amount of volume storage used that day.
Workload idleCost of provisioned IOPS, throughput, or storage that are reserved and allocated but not used by workloads. Storage cost is based on the maximum amount of volume storage used that day, while IOPS and throughput costs are based on the average amount of volume storage used that day. This is the difference between the total resources requested and the average usage. Note: This tag is only available if you have enabled Resource Collection in your AWS Integration. To prevent being charged for Cloud Security Posture Management, ensure that during the Resource Collection setup, the Cloud Security Posture Management box is unchecked.
Cluster idleCost of provisioned IOPS, throughput, or storage that are not reserved by any pods that day. This is the difference between the total cost of the resources and what is allocated to workloads.

Note: Persistent volume allocation is only supported in Kubernetes clusters, and is only available for pods that are part of a Kubernetes StatefulSet.

Compute

The cost of a host instance is split into two components: 60% for the CPU and 40% for the memory. If the host instance has GPUs, the cost is split into three components: 95% for the GPU, 3% for the CPU, and 2% for the memory. Each component is allocated to individual workloads based on their resource reservations and usage.

Costs are allocated into the following spend types:

Spend typeDescription
UsageCost of resources (such as memory, CPU, and GPU) used by workloads, based on the average usage on that day.
Workload idleCost of resources (such as memory, CPU, and GPU) that are reserved and allocated but not used by workloads. This is the difference between the total resources requested and the average usage.
Cluster idleCost of resources (such as memory, CPU, and GPU) that are not reserved by workloads in a cluster. This is the difference between the total cost of the resources and what is allocated to workloads.

Compute

The cost of a host instance is split into two components: 60% for the CPU and 40% for the memory. If the host instance has GPUs, the cost is split into three components: 95% for the GPU, 3% for the CPU, and 2% for the memory. Each component is allocated to individual workloads based on their resource reservations and usage.

Costs are allocated into the following spend types:

Spend typeDescription
UsageCost of resources (such as memory, CPU, and GPU) used by workloads, based on the average usage on that day.
Workload idleCost of resources (such as memory, CPU, and GPU) that are reserved and allocated but not used by workloads. This is the difference between the total resources requested and the average usage.
Cluster idleCost of resources (such as memory, CPU, and GPU) that are not reserved by workloads in a cluster. This is the difference between the total cost of the resources and what is allocated to workloads.
Not monitoredCost of resources where the spend type is unknown. To resolve this, install the Datadog Agent on these clusters or nodes.

Understanding resources

Depending on the cloud provider, certain resources may or may not be available for cost allocation.

ResourceAWSAzureGoogle Cloud
CPU
Memory
Persistent volumes

Storage resources within a cluster, provisioned by administrators or dynamically, that persist data independently of pod lifecycles.

Managed service fees

Cost of associated fees charged by the cloud provider for managing the cluster, such as fees for managed Kubernetes services or other container orchestration options.

ECS costsN/AN/A
Data transfer costsLimited*Limited*
GPU
Local storage

Directly-attached storage resources for a node.

Limited*Limited*

Limited* resources have been identified as part of your Kubernetes spend, but are not fully allocated to specific workloads or pods. These resources are host-level costs, not pod or namespace-level costs, and are identified with allocated_spend_type:<resource>_not_supported.

Cost metrics

When the prerequisites are met, the following cost metrics automatically appear.

Cost MetricDescription
aws.cost.amortized.shared.resources.allocatedEC2 costs allocated by the CPU & memory used by a pod or ECS task, using a 60:40 split for CPU & memory respectively and a 95:3:2 split for GPU, CPU, & memory respectively if a GPU is used by a pod. Also includes allocated EBS costs.
Based on aws.cost.amortized
aws.cost.net.amortized.shared.resources.allocatedNet EC2 costs allocated by CPU & memory used by a pod or ECS task, using a 60:40 split for CPU & memory respectively and a 95:3:2 split for GPU, CPU, & memory respectively if a GPU is used by a pod. Also includes allocated EBS costs.
Based on aws.cost.net.amortized, if available
Cost MetricDescription
azure.cost.amortized.shared.resources.allocatedAzure VM costs allocated by the CPU & memory used by a pod or container task, using a 60:40 split for CPU & memory respectively and a 95:3:2 split for GPU, CPU, & memory respectively if a GPU is used by a pod. Also includes allocated Azure costs.
Based on azure.cost.amortized
Cost MetricDescription
gcp.cost.amortized.shared.resources.allocatedGoogle Compute Engine costs allocated by the CPU & memory used by a pod, using 60:40 split for CPU & memory respectively and a 95:3:2 split for GPU, CPU, & memory respectively if a GPU is used by a pod. This allocation method is used when the bill does not already provide a specific split between CPU and memory usage.
Based on gcp.cost.amortized

These cost metrics include all of your cloud costs. This allows you to continue visualizing all of your cloud costs at one time.

For example, say you have the tag team on a storage bucket, a cloud provider managed database, and Kubernetes pods. You can use these metrics to group costs by team, which includes the costs for all three.

Applying tags

Datadog consolidates and applies the following tags from various sources to cost metrics.

Kubernetes

In addition to Kubernetes pod and Kubernetes node tags, the following non-exhaustive list of out-of-the-box tags are applied to cost metrics:

Out-of-the-box tagDescription
orchestrator:kubernetesThe orchestration platform associated with the item is Kubernetes.
kube_cluster_nameThe name of the Kubernetes cluster.
kube_namespaceThe namespace where workloads are running.
kube_deploymentThe name of the Kubernetes Deployment.
kube_stateful_setThe name of the Kubernetes StatefulSet.
pod_nameThe name of any individual pod.

Conflicts are resolved by favoring higher-specificity tags such as pod tags over lower-specificity tags such as host tags. For example, a Kubernetes pod tagged service:datadog-agent running on a node tagged service:aws-node results in a final tag service:datadog-agent.

Persistent volume

In addition to Kubernetes pod and Kubernetes node tags, the following out-of-the-box tags are applied to cost metrics.

Out-of-the-box tagDescription
persistent_volume_reclaim_policyThe Kubernetes Reclaim Policy on the Persistent Volume.
storage_class_nameThe Kubernetes Storage Class used to instantiate the Persistent Volume.
volume_modeThe Volume Mode of the Persistent Volume.
ebs_volume_typeThe type of the EBS volume. Can be gp3, gp2, or others.

Amazon ECS

In addition to ECS task tags, the following out-of-the-box tags are applied to cost metrics.

Note: Most tags from ECS containers are applied (excluding container_name).

Out-of-the-box tagDescription
orchestrator:ecsThe orchestration platform associated with the item is AWS ECS.
ecs_cluster_nameThe name of the ECS cluster.
is_aws_ecsAll costs associated with running ECS.
is_aws_ecs_on_ec2All EC2 compute costs associated with running ECS on EC2.
is_aws_ecs_on_fargateAll costs associated with running ECS on Fargate.

Data transfer

The following list of out-of-the-box tags are applied to cost metrics associated with Kubernetes workloads:

Out-of-the-box tagDescription
source_availability_zoneThe availability zone name where data transfer originated.
source_availability_zone_idThe availability zone ID where data transfer originated.
source_regionThe region where data transfer originated.
destination_availability_zoneThe availability zone name where data transfer was sent to.
destination_availability_zone_idThe availability zone ID where data transfer was sent to.
destination_regionThe region where data transfer was sent to.
allocated_resource:data_transferThe tracking and allocation of costs associated with data transfer activities.

In addition, some Kubernetes pod tags that are common between all pods on the same node are also applied.

Kubernetes

In addition to Kubernetes pod and Kubernetes node tags, the following non-exhaustive list of out-of-the-box tags are applied to cost metrics:

Out-of-the-box tagDescription
orchestrator:kubernetesThe orchestration platform associated with the item is Kubernetes.
kube_cluster_nameThe name of the Kubernetes cluster.
kube_namespaceThe namespace where workloads are running.
kube_deploymentThe name of the Kubernetes Deployment.
kube_stateful_setThe name of the Kubernetes StatefulSet.
pod_nameThe name of any individual pod.
allocated_resource:data_transferThe tracking and allocation of costs associated with data transfer activities used by Azure services or workloads.
allocated_resource:local_storageThe tracking and allocation of costs at a host level associated with local storage resources used by Azure services or workloads.

Kubernetes

In addition to Kubernetes pod and Kubernetes node tags, the following non-exhaustive list of out-of-the-box tags are applied to cost metrics:

Out-of-the-box tagDescription
orchestrator:kubernetesThe orchestration platform associated with the item is Kubernetes.
kube_cluster_nameThe name of the Kubernetes cluster.
kube_namespaceThe namespace where workloads are running.
kube_deploymentThe name of the Kubernetes Deployment.
kube_stateful_setThe name of the Kubernetes StatefulSet.
pod_nameThe name of any individual pod.
allocated_spend_type:not_monitoredThe tracking and allocation of Agentless Kubernetes costs associated with resources used by Google Cloud services or workloads, and the Datadog Agent is not monitoring those resources.
allocated_resource:data_transferThe tracking and allocation of costs associated with data transfer activities used by Google Cloud services or workloads.
allocated_resource:gpuThe tracking and allocation of costs at a host level associated with GPU resources used by Google Cloud services or workloads.
allocated_resource:local_storageThe tracking and allocation of costs at a host level associated with local storage resources used by Google Cloud services or workloads.

Further reading

Additional helpful documentation, links, and articles:

PREVIEWING: may/unit-testing