- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Cloud-based applications can generate massive amounts of data and large observability costs, ultimately placing pressure on organizations to reduce this budget line item. To reduce observability costs, many teams resort to collecting fewer metrics; however, for centralized SRE and observability teams, effective custom metrics governance should increase monitoring efficiency rather than cut visibility entirely.
This guide provides best practices for managing your custom metrics volumes through the three key components of effective metrics governance: Visibility and Attribution, Actionable Custom Metrics Governance, and Monitoring and Prevention. Learn how to use available Datadog tools to maintain cost-effective observability for these key components. You’ll learn how to:
This guide assumes you have an understanding of the following concepts in custom metrics:
The first step for managing your custom metrics volumes and costs is understanding what the key metric costs drivers are and attributing those drivers to their respective owners.
See the steps in this section to review your total account’s monthly metric usage and see a breakdown of your account’s usage by team or by select a tag key.
The Plan and Usage provides you an out-of-the-box (OOTB) summary of your account’s monthly billable custom metrics usage with detailed insights on your costs, burn rate, and Top Custom Metric names.
Knowing which metrics are the largest contributor of your account’s monthly usage and costs is the recommended starting point for using Metrics without Limits™. With this knowledge, you can find the source of these metric submissions whether by teams, service, organization, or other tag attribute. Additionally, review Usage Attribution information for a total breakdown of your account’s billable usage by tag keys. From here, you can identify your largest cost drivers by tags such as team, service, or application.
Note: Usage Attribution is an advanced feature included in the Enterprise plan. For all other plans, contact your account representative or Customer Success to request this feature.
Team-level visibility enables account administrators to hold teams accountable. More importantly, it gives teams the opportunity to understand and reduce their impact on metrics volume.
Individual teams might have limited insights into the costs of the metrics and tags they’re submitting. This results in teams being less motivated to control their usage or even limit usage growth. It is crucial for everyone to have visibility into their usage and feel empowered to take ownership in managing those volumes and associated costs.
To identify which team or service is responsible for your top custom metric names:
All teams should have visibility into which metrics are driving their bill spikes in real-time and feel confident that their cost optimization efforts do not impact another team’s visibility.
To see all actively reporting metric names submitted by your team, go to the Metrics Summary page, type in the tag key value pair (for example, team:dev
or service:demo
) in the Filter by Tag Value field.
All users in your organization can see OOTB realtime estimated custom metrics usage on the Metrics Volume Management page. Datadog’s intelligent insights help identify which metrics to focus your cost-optimization efforts on. Use Metrics Volume Management with Metrics without Limits™, to control your indexed custom metrics usage and reduce costs without sacrificing accuracy.
With Metrics Volume Management, you can identify your organization’s largest metrics as well as the metric names spiking in volume (likely culprits of any unexpected overage).
For more information, see the Metrics Volume Management documentation.
Effective custom metrics governance should increase monitoring efficiency. After you understand what your usage is and attribute usage to its source, take action to reduce your metrics.
In this section, you’ll learn about the actions you can take to maximize the ROI and value you get from your observability spend without sacrificing the visibility your team actively relies on.
Datadog’s Metrics without Limits™ is a first in industry cost management feature that decouples metric ingestion from indexing. Not all your metrics are equally valuable at every moment, and with Metrics without Limits™, you only pay for valuable metrics.
Reduce your indexed custom metrics volumes on any metric name by setting a tag configuration that you’d like to preserve for querying. It reduces your cost and preserves the mathematical accuracy of your configured metrics (all within the platform without any code-level changes).
With Metrics without Limits™, Datadog automatically provides the following:
As part of Datadog’s metrics governance best practices, start by using Metrics without Limits on your Top Custom Metrics.
For more details, see the Metrics without Limits™ documentation.
Metrics without Limits™ allows users to reduce costs on metrics by indexing less data. When used incorrectly, the configuration could lead to unintentional spikes in usage or loss of visibility from tags that are no longer indexed. To prevent unexpected changes, use RBAC permissions. You can edit an existing user role to include the metrics_tags_write
permission, or create a custom role. This gives your organization better control over which members can impact metrics cardinality and who can change Metrics without Limits™ tag configurations.
Datadog provides an audit trail of all Metrics without Limits™ configurations—detailing the configuration and user that made the configuration—so you can attribute any spikes or dips in your custom metrics usage. To view your audit trail events, enter the following query in the Events Explorer:
tags:audit "Queryable tag configuration"
To ensure you’re not removing valuable visibility while reducing costs, you need to differentiate between the actively queried metrics that your team relies on from the metrics that aren’t queried anywhere within the Datadog platform or through the API. Datadog’s intelligent query insights continuously computes and analyzes all users’ interactions (in-app or via API) on any metric to help identify less valuable, unused metrics.
Identify your organization’s entire list of unqueried metrics over the past 30 days:
After you identify the metrics that your developers don’t need, you can safely reduce the custom metrics volumes and reduce the costs of these unused metrics with Metrics without Limits™.
Based on Datadog’s intelligent query insights across thousands of custom metrics customers, we found that using Metrics without Limits™ on unqueried metrics can reduce the average customer’s custom metrics usage by up to 70%.
Even though a metric is not queried for the past 30 days, your teams might still derive value from it for incident management and outage remediation. Conversely, your teams could be underutilizing existing, actively queried metrics. So understanding the relative utility of your metrics is the next recommended step in your governance workflow.
Datadog’s Metrics without Limits™ is a suite of features that also provide you with OOTB insights to assess the value of your actively queried metrics with Metrics Related Assets. A metrics related asset refers to any Datadog asset, such as a dashboard, notebook, monitor, or SLO that queries a particular metric. Use related asset popularity and quantity to evaluate metric utility within your organization, enabling data-driven decisions. Gain a better understanding of how your team can use existing metrics to get more value from your observability spend.
To view a metric’s related assets:
In this section, you’ll learn about how to:
Datadog offers OOTB metrics that measures estimated custom metrics usage. You can use these metrics in your dashboard visualizations and monitor alerts.
Usage Type | Metric | Description |
---|---|---|
Indexed Custom Metrics | datadog.estimated_usage.metrics.custom , datadog.estimated_usage.metrics.custom.by_metric | Unique indexed Custom Metrics seen in the last hour. |
Ingested Custom Metrics | datadog.estimated_usage.metrics.custom.ingested , datadog.estimated_usage.metrics.custom.ingested.by_metric | Unique ingested Custom Metrics seen in the last hour. |
You can also see a breakdown of your realtime estimated custom metrics usage by metric name with either the dashboard timeseries widget, or a metric monitor. Use the datadog.estimated_usage.metrics.custom.by_metric
metric to build a monitor so you can always have up-to-date visibility into each of your metric names’ volumes.
After you’ve received an alert, use the Metrics Volume Management page to inspect any spiking metrics’ tag keys and use Metrics without Limits™ to immediately drop any anomalous tag keys that are causing your metric to spike. This will ensure you can immediately resolve any unintentional billing spikes.
Additional use cases to build monitors for with the estimated usage metrics: