as_count() in Monitor Evaluations

Overview

Queries using as_count() and as_rate() modifiers are calculated in ways that can yield different results in monitor evaluations. Monitors involving arithmetic and at least 1 as_count() modifier use a separate evaluation path that changes the order in which arithmetic and time aggregation are performed.

Error rate example

Suppose you want to monitor an error rate over 5 minutes using the metrics, requests.error and requests.total. Consider a single evaluation performed with these aligned timeseries points for the 5 min timeframe:

Numerator: sum:requests.error{*}

| Timestamp           | Value |
|:--------------------|:------|
| 2018-03-13 11:00:30 | 1     |
| 2018-03-13 11:01:30 | 2     |
| 2018-03-13 11:02:40 | 3     |
| 2018-03-13 11:03:30 | 4     |
| 2018-03-13 11:04:40 | 5     |

Denominator: sum:requests.total{*}

| Timestamp           | Value |
|:--------------------|:------|
| 2018-03-13 11:00:30 | 10    |
| 2018-03-13 11:01:30 | 10    |
| 2018-03-13 11:02:40 | 10    |
| 2018-03-13 11:03:30 | 10    |
| 2018-03-13 11:04:40 | 10    |

2 ways to calculate

Refer to this query as classic_eval_path:

sum(last_5m): sum:requests.error{*}.as_rate() / sum:requests.total{*}.as_rate()

and this query as as_count_eval_path:

sum(last_5m): sum:requests.error{*}.as_count() / sum:requests.total{*}.as_count()

Compare the result of the evaluation depending on the path:

PathBehaviorExpanded expressionResult
classic_eval_pathAggregation function applied after division(1/10 + 2/10 + 3/10 + 4/10 + 5/10)1.5
as_count_eval_pathAggregation function applied before division(1+2+3+4+5) / (10+10+10+10+10)0.3

Note that both evaluations above are mathematically correct. Choose a method that suits your intentions.

It may be helpful visualize the classic_eval_path as:

sum(last_5m):error/total

and the as_count_eval_path as:

sum(last_5m):error
-----------------
sum(last_5m):total

In general, avg time aggregation with .as_rate() is reasonable, but sum aggregation with .as_count() is recommended for error rates. Aggregation methods other than sum do not make sense to use with (and cannot be used with) .as_count().

Reach out to the Datadog support team if you have any questions.

PREVIEWING: mcretzman/DOCS-9337-add-cloud-info-byoti