- 필수 기능
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- 디지털 경험
- 소프트웨어 제공
- 보안
- 로그 관리
- 관리
- 인프라스트럭처
- ci
- containers
- csm
- ndm
- otel_guides
- overview
- slos
- synthetics
- tests
- 워크플로
Service Level Objectives, or SLOs, are a key part of the site reliability engineering toolkit. SLOs provide a framework for defining clear targets around application performance, which ultimately help teams provide a consistent customer experience, balance feature development with platform stability, and improve communication with internal and external users.
When creating SLOs, you can choose from the following types:
For a full comparison, see the SLO Type Comparison chart.
Use Datadog’s Service Level Objectives status page to create new SLOs or to view and manage all your existing SLOs.
After you set up the SLO, select it from the Service Level Objectives list view to open the details side panel. The side panel displays the overall status percentage and remaining error budget for each of the SLO’s targets, as well as status bars (monitor-based SLOs) or bar graphs (metric-based SLOs) of the SLI’s history. If you created a grouped monitor-based SLO using one multi alert monitor or a grouped metric-based SLO using the sum by
clause, the status percentage and remaining error budget for each individual group is displayed in addition to the overall status percentage and remaining error budget.
Example: If you create a monitor-based SLO to track latency per availability-zone, the status percentages and remaining error budget for the overall SLO and for each individual availability-zone that the SLO is tracking are displayed.
Note: The remaining error budget is displayed as a percentage and is calculated using the following formula:
$$\text"error budget remaining" = 100 * {\text"current status" - \text" target"} / { 100 - \text"target"}$$
To leverage the benefits of error budgets and error budget alerts, you must set SLO target values strictly below 100%.
Setting a 100% target means having an error budget of 0% since error budget is equal to 100%—SLO target. Without error budget representing acceptable risk, you face difficulty finding alignment between the conflicting priorities of maintaining customer-facing reliability and investing in feature development. In addition, SLOs with target values of 100% lead to division by zero errors in SLO alert evaluation.
Note: The number of decimal places you can specify for your SLOs differs depending on the type of SLO and the time windows you choose. Refer to the links below for more information for each respective SLO type.
Monitor-based SLOs: Up to two decimal places are allowed for 7-day and 30-day targets, up to three decimal places are allowed for 90-day targets.
Metric-based SLOs: Up to three decimal places are allowed for all targets.
To edit an SLO, hover over the SLO’s row in the list view and click the edit pencil icon that appears at the right of the row, or click on the row to open the details side panel and select the edit button from the cog icon in the top right of the panel.
All users can view SLOs and SLO status corrections, regardless of their associated role. Only users attached to roles with the slos_write
permission can create, edit, and delete SLOs.
To create, edit, and delete status corrections, users require the slos_corrections
permissions. A user with this permission can make status corrections, even if they do not have permission to edit those SLOs. For the full list of permissions, see the RBAC documentation.
Restrict access to individual SLOs by specifying a list of roles that are allowed to edit it.
To maintain your edit access to the SLO, the system requires you to include at least one role that you are a member of before saving. Users on the access control list can add roles and can only remove roles other than their own.
Note: Users can create SLOs on any monitor even if they do not have write permissions to the monitor. Similarly, users can create SLO alerts even if they do not have write permissions to the SLO. For more information on RBAC permissions for Monitors, see the RBAC documentation or the guide on how to set up RBAC for Monitors.
The Service Level Objectives status page lets you run an advanced search of all SLOs so you can find, view, edit, clone or delete SLOs from the search results.
Advanced search lets you query SLOs by any combination of SLO attributes:
name
and description
- text searchtime window
- 7d, 30d, 90dtype
- metric, monitorcreator
tags
- datacenter, env, service, team, etc.To run a search, use the facet checkboxes on the left and the search bar at the top. When you check the boxes, the search bar updates with the equivalent query. Likewise, when you modify the search bar query (or write one from scratch), the checkboxes update to reflect the change. Query results update in real-time as you edit the query; there’s no ‘Search’ button to click.
Group your SLOs by any tag to get a summary view of your data. You can quickly analyze how many SLOs are in each state (breached, warning, OK, and no data), grouped by service, team, user journey, tier, or any other tag set on your SLOs.
Sort SLOs by the status and error budget columns to prioritize which SLOs need your attention. The SLO list displays the details of SLOs over the primary time window selected in your configuration. All other configuration time windows are available to view in the individual side panel. Open the SLO details side panel by clicking the respective table row.
Note: You can view your SLOs from your mobile device home screen by downloading the Datadog Mobile App, available on the Apple App Store and Google Play Store.
SLO tags can be used for filtering on the SLO status page, creating SLO saved views, or grouping SLOs to view. Tags can be added to SLOs in the following ways:
The default SLO view is loaded when you land on the SLO list view.
The default view includes:
Saved views allow you to save and share customized searches in the SLO list view for SLOs that are most relevant for you and your team by sharing:
After you query for a subset of SLOs on the list view, you can add that query as a saved view.
To add a saved view:
To load a saved view, open the Saved Views panel by pressing the Show Views button at the top left of the page and select a saved view from the list. You can also search for saved views in the Filter Saved Views search box at the top of that same Saved Views panel.
Hover over a saved view from the list and select the hyperlink icon to copy the link to the saved view to share it with your teammates.
Once you are using a saved view, you can update it by selecting that saved view, modifying the query, and clicking the Update button below its name in the Saved Views panel. To change the saved view’s name or delete a saved view, hover over its row in the Saved Views panel and click the pencil icon or trash can icon, respectively.
SLO audit events allow you to track the history of your SLO configurations using the Event Explorer or the Audit History tab in the SLO details. Audit events are added to the Event Explorer every time you create, modify, or delete an SLO or SLO status correction. Each event includes information on the configuration of an SLO or SLO status correction, and the stream provides a history of the configuration changes over time.
Each event includes the following SLO configuration information:
Three types of SLO audit events appear in the Event Explorer:
SLO Created
events show the SLO configuration information at creation timeSLO Modified
events show what configuration information changed during a modificationSLO Deleted
events show the configuration information the SLO had before it was deletedEach event includes the following SLO status correction configuration information:
Three types of SLO status correction audit events appear in the Event Explorer:
SLO Correction Created
events show the status correction configuration information at creation timeSLO Correction Modified
events show what configuration information changed during a modificationSLO Correction Deleted
events show the configuration information the status correction had before it was deletedTo get a full list of all SLO audit events, enter the search query tags:(audit AND slo)
in the Event Explorer. To view the list of audit events for a specific SLO, enter tags:audit,slo_id:<SLO ID>
with the ID of the desired SLO. You can also query the Event Explorer programmatically using the Datadog Events API.
Note: If you don’t see events appear in the UI, be sure to set the time frame of the Event Explorer to a longer period, for example, the past 7 days.
You can also use the “Audit History” tab in the SLO details to view all audit events for an individual SLO:
With Event Monitors, you can set up notifications to track SLO audit events. For example, if you wish to be notified when a specific SLO’s configuration is modified, set an Event Monitor to track the text [SLO Modified]
over the tags audit,slo_id:<SLO ID>
.
Learn without cost on real cloud compute capacity and a Datadog trial account. Enroll today to learn more about building Dashboards to track SLOs.
After creating your SLO, you can visualize the data through Dashboards and widgets.
For more information about SLO Widgets, see the SLO widget and SLO List widget pages. For more information on the SLO data source, see the guide on how to Graph historical SLO data on Dashboards.
Status corrections allow you to exclude specific time periods from SLO status and error budget calculations. This way, you can:
When you apply a correction, the time period you specify is dropped from the SLO’s calculation.
You have the option to create one-time corrections for ad hoc adjustments, or recurring corrections for predictable adjustments that occur on a regular cadence. One-time corrections require a start and end time, while recurring corrections require a start time, duration, and interval. Recurring corrections are based on iCalendar RFC 5545’s RRULE specification. The supported rules are FREQ
, INTERVAL
, COUNT
, and UNTIL
. Specifying an end date for recurring corrections is optional in case you need the correction to repeat indefinitely.
For either type of correction, you must select a correction category that states why the correction is being made. The available categories are Scheduled Maintenance
, Outside Business Hours
, Deployment
, and Other
. You can optionally include a description to provide additional context if necessary.
Each SLO has a maximum limit of corrections that can be configured to ensure query performance. These limits only apply to the past 90 days per SLO, so corrections for time periods before the past 90 days do not count towards your limit. This means that:
The 90-day limits per SLO are as follows:
Correction Type | Limit per SLO |
---|---|
One-time | 100 |
Daily recurring | 2 |
Weekly recurring | 3 |
Monthly recurring | 5 |
You may configure status corrections through the UI by selecting Correct Status
in your SLO’s side panel, the SLO status corrections API, or a Terraform resource.
To access SLO status corrections in the UI:
One-Time
and Recurring
in the Select the Time Correction Window, and specify the time period you wish to correct.To view, edit, and delete existing status corrections, click on the Corrections tab at the top of an SLO’s detailed side panel view.
The SLO Calendar View is available on the SLO status page. On the top right corner, switch from the “Primary” view to the “Weekly” or “Monthly” view to see 12 months of historical SLO status data. The Calendar View is supported for Metric-based SLOs and Time Slice SLOs.
The CSV Export feature is in Private Beta. Complete the form to request access.
Request AccessThe SLO CSV Export feature is available on the SLO status page once you switch to the “Weekly” or “Monthly” Calendar View. In these views, you can access the new “Export to CSV” option to download a CSV of your historical SLO data with the following information:
The following time windows are available for the CSV export:
These times are based on the user’s timezone setting in Datadog.
The SLO statuses are calculated based on the SLO type:
Notes:
Additional helpful documentation, links, and articles: