Getting started
Navigate to the SLO Manage page.
Start thinking from the perspective of your user:
- How are your users interacting with your application?
- What is their journey through the application?
- Which parts of your infrastructure do these journeys interact with?
- What are they expecting from your systems and what are they hoping to accomplish?
Select the relevant SLI
STEP 1
Response/Request
Type of SLI | Description |
---|
Availability | Could the server respond to the request successfully? |
Latency | How long did it take for the server to respond to the request? |
Throughput | How many requests can be handled? |
Storage
Type of SLI | Description |
---|
Availability | Can the data be accessed on demand? |
Latency | How long does it take to read or write data? |
Durability | Is the data still there when it is needed? |
Pipeline
Type of SLI | Description |
---|
Correctness | Was the right data returned? |
Freshness | How long does it take for new data or processed results to appear? |
STEP 2
Do you require an SLI calculation that is time-based or count-based?
The following SLO types are available in Datadog:
Metric-based SLOs
Example: 99% of requests should complete in less than 250 ms over a 30-day window.
- Count-based SLI calculation
- SLI is calculated as the sum of good events divided by the sum of total events
Monitor-based SLOs
Example: the latency of all user requests should be less than 250 ms 99% of the time in any
30-day window.
- Time-based SLI calculation
- SLI calculated based on the underlying Monitor’s uptime
- You can select a single monitor, multiple monitors (up to 20), or a single multi alert monitor with groups
If you need to create a new monitor go to the Monitor create page.
Time Slice SLOs
Example: the latency of all user requests should be less than 250 ms 99% of the time in any
30-day window.
- Time-based SLI calculation
- SLI calculated based on your custom uptime definition using a metric query
Implement your SLIs
- Custom metrics (for example, counters)
- Integration metrics (for example, load balancer, http requests)
- Datadog APM (for example, errors, latency on services and resources)
- Datadog Logs (for example, metrics generated from logs for a count of particular occurrence)
Set your target objective and time window
- Select your target:
99%
, 99.5%
, 99.9%
, 99.95%
, or any other target value that makes sense for your requirements. - Select your time window: over the last rolling
7
, 30
, or 90 days
Name, describe, and tag your SLOs
- Name your SLO.
- Add a description: describe what the SLO is tracking and why it is important for your end user experience. You can also add links to dashboards for reference.
- Add tags: tagging by
team
and service
is a common practice.
View and search
Use tags to search for your SLOs from the SLO list view.
Further Reading
Additional helpful documentation, links, and articles: