A resource is a particular action for a given service (typically an individual endpoint or query). Read more about resources in Getting Started with APM. For each resource, APM automatically generates a dashboard page covering:
- Key health metrics
- Monitor status for all monitors associated with this service
- List of metrics for all resources associated with this service
Out-of-the-box graphs
Datadog provides out-of-the-box graphs for any given resource. Use the dropdown above each graph to change the displayed information.
Requests and Errors
The Requests and Errors graph displays the total number of requests (hits) and errors over time. Using the dropdown menu, you can also view:
- Requests by Version: Breakdown of requests across different service versions.
- Requests per Second by Version: The rate of requests for each version.
- Requests and Errors Per Second: The rate of requests (hits) and errors per second.
Errors
The Errors graph displays the total count of errors over time. Using the dropdown menu, you can also view:
- Errors by Version: The error counts for each service version side by side.
- Errors per Second by Version: The error rate (errors per second) for each service version over time.
- Errors per Second: The overall error rate for the service, per second.
- % Error Rate by Version: The percentage of requests resulting in errors for each service version.
- % Error Rate: The overall error rate for the service, as a percentage.
Latency
The Latency graph displays the latency percentiles as a timeseries. Using the dropdown menu, you can also view:
- Latency by Version: Latency broken down by service version.
- Historical Latency: Comparison of the current latency distribution with the previous day and week.
- Latency Distribution: The distribution of latencies over the selected time frame.
- Latency by Error: The latency of requests over time, segmented by whether the requests resulted in errors.
- Apdex (Application Performance Index): The Apdex score over time.
Avg Time per Request
For services involving multiple downstream services, a fourth graph breaks down the average execution time spent per request. This graph is built on sampled trace data, unlike the other top graphs which use unsampled data sources.
Using the dropdown menu, you can also view:
- Total Time Spent: The cumulative time spent in each downstream service over time.
- % of Time Spent: The percentage of time spent in each downstream service relative to the total time.
For services like Postgres or Redis, which are final operations that do not call other services, there is no sub-services graph. Watchdog performs automatic anomaly detection on the Requests, Latency, and Error graphs. If an anomaly is detected, an overlay appears on the graph. Clicking the Watchdog icon provides more details in a side panel.
Export to dashboard
On the upper-right corner of each graph, click on the up arrow in order to export your graph into a pre-existing Dashboard.
Latency distribution
The resource page also displays a resource latency distribution graph:
Use the top right percentile selectors to zoom into a given percentile, or hover over the sidebar to view percentile markers.
Dependency Map with Navigator
You can also view a map of all of a resource’s upstream and downstream service dependencies. With the Dependency Map Navigator, you can see the flow of services, with spans that go through a specific resource (endpoint, database query, etc.) end-to-end, along with their request counts.
This map is based on a sample of ingested spans; the sample is drawn by a fixed sampling algorithm that considers the structure of traces. The sampling algorithm is not configurable and is not impacted by ingestion control.
The dependency map is only available for resources containing service entry spans.
Hover over a node to view metrics of each service including requests/second, error rate, and average latency. Click on a node to open a context menu with options to view the Service Page, related traces, and more.
The highlight color of the node indicates the service’s monitor status. If a service has more than one configured monitor, the status of the most severe monitor is shown.
Load amplification
A service has load amplification if it’s receiving more than 100% of the requests received by the selected resource upstream. Services with call paths highlighted in orange have load amplification, and the amplification multiplier is shown in the list on the panel. The amplification is calculated based on the requests received by the resource (shown highlighted on the map in the image below), and the requests received by the downstream service (shown inside the downstream service node on the map). By clicking on a service in the list, you can see the spans contributing to the amplification.
Frontend Impact
Datadog provides you visibility into how a web resource impacts your frontend applications. You can understand what frontend view is sending requests to the resource and identify views that are experiencing high latency or errors from the resource.
Isolate requests and errors over time for a specific frontend view by hovering over a RUM View Name in the table and clicking on Isolate this View. From here, you can explore sampled traces originating from the frontend views by clicking on View Traces at the top right of the panel. You can also investigate the sampled RUM sessions for each view by clicking on the context menu for a frontend view in the table.
The frontend impact panel is only available if you use Real User Monitoring (RUM) and the resource belongs to a web service. Unlike the requests, errors, and latency graphs which use unsampled data sources, the frontend impact metrics are built on sampled trace data from the past 1 hour:
RUM View Name:
- Name of the frontend view
App Name:
- Name of application that contains the frontend view
Sessions:
- Number of sessions for the frontend view
Error Rate Per Sessions:
- Number of sessions that included the frontend view
P95 Latency
- P95 latency for requests originating from the frontend view
Requests
- Number of requests originating from the frontend view
Span summary
For a given resource, Datadog provides you a span analysis breakdown of all matching traces:
The displayed metrics represent, per span:
Avg Spans/trace
- Average number of occurrences of the span, for traces including the current resource, where the span is present at least once.
% of Traces
- Percentage of traces including the current resource where the span is present at least once.
Avg Duration
- Average duration of the span, for traces including the current resource, where the span is present at least once.
Avg % Exec Time
- Average ratio of execution time for which the span was active, for traces including the current resource, where the span is present at least once.
Note: A span is considered active when it’s not waiting for a child span to complete. The active spans at a given time, for a given trace, are all the leaf spans (in other words, spans without children).
The span summary table is only available for resources containing service entry spans.
Traces
Consult the list of traces associated with this resource in the Trace search modal already filtered on your environment, service, operation, and resource name:
Further Reading
Additional helpful documentation, links, and articles: