SLO Checklist

Getting started

Navigate to the SLO Manage page.
Start thinking from the perspective of your user:
- How are your users interacting with your application?
- What is their journey through the application?
- Which parts of your infrastructure do these journeys interact with?
- What are they expecting from your systems and what are they hoping to accomplish?

Type of SLI	Description
Availability	Could the server respond to the request successfully?
Latency	How long did it take for the server to respond to the request?
Throughput	How many requests can be handled?

Type of SLI	Description
Availability	Can the data be accessed on demand?
Latency	How long does it take to read or write data?
Durability	Is the data still there when it is needed?

Type of SLI	Description
Correctness	Was the right data returned?
Freshness	How long does it take for new data or processed results to appear?

Whenever possible, use metric-based SLOs. It’s best practice to have SLOs where the error budget reflects the number of bad events you have left before you breach your SLO. Your SLO calculations will also be volume weighted based on the number of events.
If, instead, you want an SLO that tracks uptime and uses a time-based SLI calculation, use time slice SLOs. Unlike monitor-based SLOs, time slice SLOs don’t require you to maintain an underlying monitor for your SLO.
Finally, consider monitor-based SLOs for use cases that are not covered by time slice SLOs, which include SLOs based on non-metric monitors or multiple monitors.

For a detailed comparison of the SLO types, see the SLO Type Comparison guide.

Do you require an SLI calculation that is time-based or count-based?

The following SLO types are available in Datadog:

Metric-based SLOs

Example: 99% of requests should complete in less than 250 ms over a 30-day window.

Monitor-based SLOs

Example: the latency of all user requests should be less than 250 ms 99% of the time in any 30-day window.

Time-based SLI calculation
SLI calculated based on the underlying Monitor’s uptime
You can select a single monitor, multiple monitors (up to 20), or a single multi alert monitor with groups

If you need to create a new monitor go to the Monitor create page.

Time Slice SLOs

Example: the latency of all user requests should be less than 250 ms 99% of the time in any 30-day window.

Custom metrics (for example, counters)
Integration metrics (for example, load balancer, http requests)
Datadog APM (for example, errors, latency on services and resources)
Datadog Logs (for example, metrics generated from logs for a count of particular occurrence)

Select your target: 99%, 99.5%, 99.9%, 99.95%, or any other target value that makes sense for your requirements.
Select your time window: over the last rolling 7, 30, or 90 days

Name your SLO.
Add a description: describe what the SLO is tracking and why it is important for your end user experience. You can also add links to dashboards for reference.
Add tags: tagging by team and service is a common practice.