モニター、SLO、ダッシュボードでの USM メトリクスの活用

Docs > ユニバーサルサービスモニタリング > ユニバーサルサービスモニタリングガイド > モニター、SLO、ダッシュボードでの USM メトリクスの活用

概要

ユニバーサルサービスモニタリングは、一般的なコンテナタグ (app、short_image、kube_deployment など) を使用してサービスを検出し、それらのサービスのサービスカタログにエントリーを生成します。

You can access request, error, and duration metrics in Datadog for both inbound and outbound traffic on all services discovered with Universal Service Monitoring. These service health metrics are useful for creating alerts, tracking deployments, and getting started with service level objectives (SLOs) so you can get broad visibility into all services running on your infrastructure.

このガイドでは、universal.http.* などの USM メトリクスを検索して、モニター、SLO、ダッシュボードで使用する方法について説明します。

USM メトリクスと APM メトリクスの比較

メトリクス名	単位	タイプ	説明
universal.http.client	秒	Distribution	アウトバウンドリクエストのレイテンシー、カウント、エラー、およびレート。
universal.http.client.hits	Hits	カウント	アウトバウンドリクエストとエラーの合計数。
universal.http.client.apdex	スコア	Gauge	このサービスのアウトバウンドリクエストの Apdex スコア。
universal.http.server	秒	Distribution	インバウンドリクエストのレイテンシー、カウント、エラー、およびレート。
universal.http.server.hits	Hits	カウント	インバウンドリクエストとエラーの合計数。
universal.http.server.apdex	スコア	Gauge	この Web サービスの Apdex スコア。

APM メトリクスとは異なり、エラーは別のメトリクスとしてではなく、error:true タグの下で利用可能です。

Note: The .hits metrics have all of your infrastructure tags and are the recommended way to query request and error counts. You can also add second primary tags to all USM metrics.

メトリクス構文

The USM metric query syntax differs from the APM metric query syntax, which uses trace.*. USM Metrics fall under a single distribution metric name.

例:

APM	USM
trace.universal.http.client.hits{*}	count:universal.http.client{*}
trace.universal.http.client.errors	count:universal.http.client{error:true}
trace.universal.http.client.hits.by_http_status	count:universal.http.client{*} by http_status_family
pXX:trace.universal.http.client{*}	pXX:universal.http.client{*}
trace.universal.http.client.apdex{*}	universal.http.client.apdex{*}

The same translations apply for the universal.http.server operation that captures inbound traffic. For more information about distribution metrics, see DDSketch-based Metrics in APM.

使用方法

Navigate to Infrastructure > Universal Service Monitoring, filter by Universal Service Monitoring telemetry type, and click on a service. The Performance tab displays service-level graphs on hits, latency, requests, errors, and more. You can also access these metrics when creating a monitor or an SLO, or by looking at a dashboard in the Service Catalog.

モニターを作成

You can create an APM Monitor to trigger an alert when a USM metric such as universal.http.client either crosses a threshold or deviates from an expected pattern.

Navigate to Monitors > New Monitor and click APM.
Select APM Metrics and define a service or resource’s env and any other primary tags. Select a service or resource to monitor and define time interval for the monitor to evaluate the query over.
Threshold Alert を選択し、トリガーするモニターのために Requests per Second のような USM メトリクスを選択します。次に、アラートと警告のしきい値を上または下にするかどうかを定義します。アラートしきい値、およびオプションで警告しきい値に値を入力します。
通知セクションには、このモニター用にあらかじめ入力されたメッセージが含まれています。アラート名とメッセージをカスタマイズし、このモニターの権限を定義します。
Create をクリックします。

For more information, see the APM Monitor documentation.

SLO を作成する

You can create an SLO on a per-service basis to ensure you are meeting objectives set by USM metrics and improving availability over time. Datadog recommends creating an SLO programmatically to cover a lot of services.

サービスカタログから SLO を作成するには

Navigate to the Reliability tab of the Service Catalog.
SLOs 列で、サービスにカーソルを合わせ、+ Create Availability SLO または + Create Latency SLO をクリックします。

BITSBOUTIQUE のユニバーサルサービスモニタリング SLO を設定する

オプションで、USM メトリクスを使用して SLO を手動で作成するには

Navigate to Service Management > SLOs and click New SLO.
Metric Based を選択し、Good events (numerator) セクションで 2 つのクエリを作成します。
- クエリ A: universal.http.server のような USM メトリクスを入力し、from フィールドにプライマリ service と env タグを追加して特定のサービスにフィルターし、as フィールドで count を選択します。
- クエリ B: universal.http.server のような USM メトリクスを入力し、from フィールドに error:true タグに加えて、プライマリ service と env タグを追加して特定のサービスにフィルターし、as フィールドで count を選択します。
+ Add Formula をクリックし、a-b と入力します。
Total events (denominator) セクションでは、universal.http.server のような USM メトリクスを入力し、from フィールドにプライマリ service と env タグを追加して特定のサービスにフィルターし、as フィールドで count を選択します。
+ New Target をクリックすると、以下の設定でターゲットしきい値が作成されます。
- タイムウィンドウは 7 Days、ターゲットしきい値は 95%、警告しきい値は 99.5% です。Datadog では、すべてのタイムウィンドウで同じターゲットしきい値を設定することを推奨しています。
この SLO の名前と説明を入力します。team タグに加えて、プライマリ env と service タグを設定します。
Save and Set Alert をクリックします。