This page is not yet available in Spanish. We are working on its translation.
If you have any questions or feedback about our current translation project, feel free to reach out to us!

Overview

When you need to further refine the results of a log search, use subqueries to compare your results against a secondary group of logs, or filter your logs using data from your Reference Tables.

Filter logs with subqueries

Filtering logs with subqueries is in beta.

Use subqueries when you want to filter the results of a query based on the results of a secondary query. See subquery examples for two example scenarios.

To add a subquery filter:

  1. Navigate to Log Explorer.
  2. Enter a query in the search bar to filter your logs. This is the main query.
  3. Click + Add.
  4. Under the Add Query Filter section, select Logs.

This introduces new elements to the query editor:

The subquery editor under the main search query
  1. In the where field, use the dropdown menu to select the attribute you want to correlate on. The available attributes are from the logs returned by your main query.
  2. Define your subquery filter in the from field.
  3. In the Select Column dropdown menu that precedes the subquery filter, select the attribute you want to group and sort your subquery results by.
  4. Select whether you want to use the IN or NOT IN operator:
    • The results from the IN operator only include logs where the attribute value is also found in the subquery results. For example, if you only want to see logs generated by service:a and is associated to a user who is also one of the top users of service:b.
    • The results from the NOT IN operator exclude logs where the attribute value is found in the subquery results. For example, if you only want to see status:error logs, but also want to filter out status:error logs where users associated with those logs eventually end up with a status:success log. See Filter outdated or superseded logs for a detailed example.
  5. Optionally, reduce the number of subquery attribute values to match against. The default value and maximum value is 1000. Choose between top (highest frequency values) or bottom (lowest frequency values).

Subquery examples

The following are scenarios where you need to use subqueries to get the information you need from your logs.

Filter outdated or superseded logs

Assume you operate an e-commerce platform. A log is generated each time one of your customers tries to place an order. You want to analyze your logs to understand the total value of potential purchases lost due to an ongoing issue with your website.

You realize, however, that an order can fail multiple times before successfully completing: meaning for that particular order ID, there are log entries in the search results for both status:error logs and status:success logs. If you extracted lists of unique order IDs from the two queries, this order ID would appear in both. With subqueries, you can obtain a mutually exclusive list.

In this example, you are only interested in logs for orders that did not eventually succeed. To exclude orders that eventually succeeded using the subquery feature:

  1. Define a subquery for status:success logs.
  2. Select the NOT IN operator to exclude orders from the subquery result set.
The query editor showing the setup for filtering out orders that ended up succeeding

Correlate across different log sources

Assume you have a service named network_directory that monitors all internal network resources and access to those resources within your organization. Log events generated by this service include standard attributes (like host, service, source) and custom attributes like the client’s IP address.

Additionally, you have another device-manager service that tracks all internal assets (infrastructure, employee devices, and so on.)

You are investigating an ongoing attack and observe there is a significant increase in API requests across almost all of your endpoints. You want to first identify IP addresses associated with anomalous request volumes so that you can block them at the firewall level. However, your internal services are some of the largest consumers of these endpoints and you need to exclude them from your query results to avoid mistakenly blocking them.

In this example, use service:network_directory as your main query, and then define a subquery filter for your device-manager service to filter out results from recognized devices.

The query editor showing the setup for filtering out the results for recognized devices

Filter logs based on Reference Tables

Reference Tables are in beta. Reference Tables containing over 40,000 rows cannot be used to filter logs. See Add Custom Metadata with Reference Tables for more information on how to create and manage Reference Tables.

Reference Tables allow you to combine metadata with logs, providing more information to resolve application issues. Add a query filter based on a Reference Table to perform lookup queries. For more information on creating and managing this feature, see the Reference Tables guide.

To apply a query filter with Reference Tables:

  1. Click the Add button next to the query editor, and then select Join with Reference Table.
  2. Select your reference table in the dropdown menu.
  3. Select the log field you want to look for in the reference table.
  4. Select the IN or NOT IN operator depending on whether you want to find the field value in the specific column.

In the following example, the Reference Table query filter is used to search all recent logs that include a malicious IP address from a threat intel reference table:

The Datadog Log Explorer with reference table search options highlighted

Further reading

PREVIEWING: rtrieu/product-analytics-ui-changes