Azure OpenAI

Overview

Azure OpenAI enables development of copilots and generative AI applications using OpenAI’s library of models. Use the Datadog integration to track the performance and usage of the Azure OpenAI API and deployments.

Setup

Installation

If you haven’t already, set up the Microsoft Azure integration first. There are no other installation steps.

Data Collected

Metrics

azure.cognitiveservices_accounts.active_tokens (gauge)	Total tokens minus cached tokens over a period of time. Applies to PTU and PTU-managed deployments. Use this metric to understand your TPS or TPM-based utilization for PTUs and compare to your benchmarks for target TPS or TPM for your scenarios.
azure.cognitiveservices_accounts.azure_open_ai_requests (count)	Number of calls made to the Azure OpenAI API over a period of time. Applies to PTU, PTU-Managed, and Pay-as-you-go deployments.
azure.cognitiveservices_accounts.blocked_volume (count)	Number of calls made to the Azure OpenAI API and rejected by a content filter applied over a period of time. You can add a filter or apply splitting by the following dimensions: ModelDeploymentName, ModelName, and TextType.
azure.cognitiveservices_accounts.generated_completion_tokens (count)	Number of Generated Completion Tokens from an OpenAI model.
azure.cognitiveservices_accounts.processed_fine_tuned_training_hours (count)	Number of training hours processed on an OpenAI fine-tuned model.
azure.cognitiveservices_accounts.harmful_volume_detected (count)	Number of calls made to Azure OpenAI API and detected as harmful (both block model and annotate mode) by content filter applied over a period of time.
azure.cognitiveservices_accounts.processed_prompt_tokens (count)	Number of prompt tokens processed on an OpenAI model.
azure.cognitiveservices_accounts.processed_inference_tokens (count)	Number of inference tokens processed on an OpenAI model.
azure.cognitiveservices_accounts.prompt_token_cache_match_rate (gauge)	Percentage of the prompt tokens that hit the cache. Shown as percent
azure.cognitiveservices_accounts.provisioned_managed_utilization (gauge)	Utilization % for a provisoned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100. When utilization is greater than or equal to 100%, calls are throttled and error code 429 is returned. Shown as percent
azure.cognitiveservices_accounts.provisioned_managed_utilization_v2 (gauge)	Utilization % for a provisoned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100. When utilization is greater than or equal to 100%, calls are throttled and error code 429 is returned. Shown as percent
azure.cognitiveservices_accounts.time_to_response (gauge)	Recommended latency (responsiveness) measure for streaming requests. Applies to PTU and PTU-managed deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. Shown as millisecond
azure.cognitiveservices_accounts.total_volume_sent_for_safety_check (count)	Number of calls made to the Azure OpenAI API and detected by a content filter applied over a period of time.

Service Checks

The Azure OpenAI integration does not include any service checks.

Events

The Azure OpenAI integration does not include any events.

Troubleshooting

Need help? Contact Datadog support.