APM Terms and Concepts

Overview

The APM UI provides many tools to troubleshoot application performance and correlate it throughout the product, enabling you to find and resolve issues in distributed systems.

For additional definitions and descriptions of important APM terms such as spans and indexed, see the main Glossary.

ConceptDescription
ServiceServices are the building blocks of modern microservice architectures - broadly a service groups together endpoints, queries, or jobs for the purposes of building your application.
ResourceResources represent a particular domain of a customer application - they are typically an instrumented web endpoint, database query, or background job.
MonitorsAPM metric monitors work like regular metric monitors, but with controls tailored specifically to APM. Use these monitors to receive alerts at the service level on hits, errors, and a variety of latency measures.
TraceA trace is used to track the time spent by an application processing a request and the status of this request. Each trace consists of one or more spans.
Trace Context PropagationThe method of passing trace identifiers between services, enabling a Datadog to stitch together individual spans into a complete distributed trace.
Retention FiltersRetention filters are tag-based controls set within the Datadog UI that determine what spans to index in Datadog for 15 days.
Ingestion ControlsIngestion controls are used to send up to 100% of traces to Datadog for live search and analytics for 15 minutes.
InstrumentationInstrumentation is the process of adding code to your application to capture and report observability data.
BaggageBaggage is contextual information that is passed between traces, metrics, and logs in the form of key-value pairs.

Services

After instrumenting your application, the Service Catalog is your main landing page for APM data.

Service Catalog

Services are the building blocks of modern microservice architectures - broadly a service groups together endpoints, queries, or jobs for the purposes of scaling instances. Some examples:

  • A group of URL endpoints may be grouped together under an API service.
  • A group of DB queries that are grouped together within one database service.
  • A group of periodic jobs configured in the crond service.

The screenshot below is a microservice distributed system for an e-commerce site builder. There’s a web-store, ad-server, payment-db, and auth-service all represented as services in APM.

service map

All services can be found in the Service Catalog and visually represented on the Service Map. Each service has its own Service page where trace metrics like throughput, latency, and error rates can be viewed and inspected. Use these metrics to create dashboard widgets, create monitors, and see the performance of every resource such as a web endpoint or database query belonging to the service.

Don't see the HTTP endpoints you were expecting on the Service page? In APM, endpoints are connected to a service by more than the service name. It is also done with the `span.name` of the entry-point span of the trace. For example, on the web-store service above, `web.request` is the entry-point span. More info on this here.

Resources

Resources represent a particular domain of a customer application. They could typically be an instrumented web endpoint, database query, or background job. For a web service, these resources can be dynamic web endpoints that are grouped by a static span name - web.request. In a database service, these would be database queries with the span name db.query. For example the web-store service has automatically instrumented resources - web endpoints - which handle checkouts, updating carts, adding items, and so on. A Resource name can be the HTTP method and the HTTP route, for example GET /productpage or ShoppingCartController#checkout.

Each resource has its own Resource page with trace metrics scoped to the specific endpoint. Trace metrics can be used like any other Datadog metric - they are exportable to a dashboard or can be used to create monitors. The Resource page also shows the span summary widget with an aggregate view of spans for all traces, latency distribution of requests, and traces which show requests made to this endpoint.

Trace

A trace is used to track the time spent by an application processing a request and the status of this request. Each trace consists of one or more spans. During the lifetime of the request, you can see distributed calls across services (because a trace-id is injected/extracted through HTTP headers), automatically instrumented libraries, and manual instrumentation using open-source tools like OpenTracing in the flame graph view. In the Trace View page, each trace collects information that connects it to other parts of the platform, including connecting logs to traces, adding tags to spans, and collecting runtime metrics.

trace view

Trace context propagation

Trace context propagation is the method of passing trace identifiers between services in a distributed system. It enables Datadog to stitch together individual spans from different services into a single distributed trace. Trace context propagation works by injecting identifiers, such as the trace ID and parent span ID, into HTTP headers as the request flows through the system. The downstream service then extracts these identifiers and continues the trace. This allows the Datadog to reconstruct the full path of a request across multiple services.

For more information, see the propagating the trace context for your application’s language.

Retention filters

Set tag-based filters in the UI to index spans for 15 days for use with Trace Search and Analytics.

Ingestion controls

Send 100% of traces from your services to Datadog and combine with tag-based retention filters to keep traces that matter for your business for 15 days.

Instrumentation

Instrumentation is the process of adding code to your application to capture and report observability data to Datadog, such as traces, metrics, and logs. Datadog provides instrumentation libraries for various programming languages and frameworks.

You can automatically instrument your application when you install the Datadog Agent with Single Step Instrumentation or when you manually add Datadog tracing libraries to your code.

You can use custom instrumentation by embedding tracing code directly into your application code. This allows you to programmatically create, modify, or delete traces to send to Datadog.

To learn more, read Application Instrumentation.

Baggage

Baggage allows you to propagate key-value pairs (also known as baggage items) across service boundaries in a distributed system. Unlike trace context, which focuses on trace identifiers, baggage allows for the transmission of business data and other contextual information alongside traces.

To learn more, read supported propagation formats for your application’s language.

Further Reading

PREVIEWING: esther/docs-9518-update-example-control-sensitive-log-data