- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Sensitive data, such as credit card numbers, bank routing numbers, and API keys, can be revealed unintentionally in your logs, which can expose your organization to financial and privacy risks.
Use the Observability Pipelines to identify, tag, and optionally redact or hash sensitive information before routing logs to different destinations and outside of your infrastructure. You can use out-of-the-box scanning rules to detect common patterns such as email addresses, credit card numbers, API keys, authorization tokens, and more. You can also create custom scanning rules using regex patterns to match sensitive information.
This document walks you through the following steps:
To use Observability Pipelines’s Syslog source, your applications must be sending data in one of the following formats: RFC 6587, RFC 5424, RFC 3164. You also need to have the following information available:
0.0.0.0:8088
. Later on, you configure your applications to send logs to this address.To configure your Syslog source:
Server Certificate Path
: The path to the certificate file that has been signed by your Certificate Authority (CA) Root File in DER or PEM (X.509) format.CA Certificate Path
: The path to the certificate file that is your Certificate Authority (CA) Root File in DER or PEM (X.509) format.Private Key Path
: The path to the .key
private key file that belongs to your Server Certificate Path in DER or PEM (PKCS#8) format.Enter the following information based on your selected logs destination.
There are no configuration steps for your Datadog destination.
The following fields are optional:
true
, Splunk extracts the timestamp from the message with the expected format of yyyy-mm-dd hh:mm:ss
.sourcetype
to override Splunk’s default value, which is httpevent
for HEC data.The following fields are optional:
JSON
, Logfmt
, or Raw
text. If no decoding is selected, the decoding defaults to JSON.name
value configured for your Sumo Logic collector’s source.host
value configured for your Sumo Logic collector’s source.category
value configured for your Sumo Logic collector’s source.The rsyslog and syslog-ng destinations match these log fields to the following Syslog fields:
Log Event | SYSLOG FIELD | Default |
---|---|---|
log[“message”] | MESSAGE | NIL |
log[“procid”] | PROCID | The running Worker’s process ID. |
log[“appname”] | APP-NAME | observability_pipelines |
log[“facility”] | FACILITY | 8 (log_user) |
log[“msgid”] | MSGID | NIL |
log[“severity”] | SEVERITY | info |
log[“host”] | HOSTNAME | NIL |
log[“timestamp”] | TIMESTAMP | Current UTC time. |
The following destination settings are optional:
Server Certificate Path
: The path to the certificate file that has been signed by your Certificate Authority (CA) Root File in DER or PEM (X.509).CA Certificate Path
: The path to the certificate file that is your Certificate Authority (CA) Root File in DER or PEM (X.509).Private Key Path
: The path to the .key
private key file that belongs to your Server Certificate Path in DER or PEM (PKCS#8) format.To authenticate the Observability Pipelines Worker for Google Chronicle, contact your Google Security Operations representative for a Google Developer Service Account Credential. This credential is a JSON file and must be placed under DD_OP_DATA_DIR/config
. See Getting API authentication credential for more information.
Note: If you are installing the Worker in Kubernetes, see Referencing files in Kubernetes for information on how to reference the credentials file.
To set up the Worker’s Google Chronicle destination:
Note: Logs sent to the Google Chronicle destination must have ingestion labels. For example, if the logs are from a A10 load balancer, it must have the ingestion label A10_LOAD_BALANCER
. See Google Cloud’s Support log types with a default parser for a list of available log types and their respective ingestion labels.
The following fields are optional:
Optionally, enter the name of the OpenSearch index.
There are pre-selected processors added to your processor group out of the box. You can add additional processors or delete any existing ones based on your processing needs.
Processor groups are executed from top to bottom. The order of the processors is important because logs are checked by each processor, but only logs that match the processor’s filters are processed. To modify the order of the processors, use the drag handle on the top left corner of the processor you want to move.
Each processor has a corresponding filter query in their fields. Processors only process logs that match their filter query. And for all processors except the filter processor, logs that do not match the query are sent to the next step of the pipeline. For the filter processor, logs that do not match the query are dropped.
For any attribute, tag, or key:value
pair that is not a reserved attribute, your query must start with @
. Conversely, to filter reserved attributes, you do not need to append @
in front of your filter query.
For example, to filter out and drop status:info
logs, your filter can be set as NOT (status:info)
. To filter out and drop system-status:info
, your filter must be set as NOT (@system-status:info)
.
Filter query examples:
NOT (status:debug)
: This filters for only logs that do not have the status DEBUG
.status:ok service:flask-web-app
: This filters for all logs with the status OK
from your flask-web-app
service.status:ok AND service:flask-web-app
.host:COMP-A9JNGYK OR host:COMP-J58KAS
: This filter query only matches logs from the labeled hosts.@user.status:inactive
: This filters for logs with the status inactive
nested under the user
attribute.Learn more about writing filter queries in Datadog’s Log Search Syntax.
Enter the information for the processors you want to use. Click the Add button to add additional processors. To delete a processor, click the kebab on the right side of the processor and select Delete.
This processor filters for logs that match the specified filter query and drops all non-matching logs. If a log is dropped at this processor, then none of the processors below this one receives that log. This processor can filter out unnecessary logs, such as debug or warning logs.
To set up the filter processor:
The remap processor can add, drop, or rename fields within your individual log data. Use this processor to enrich your logs with additional context, remove low-value fields to reduce volume, and standardize naming across important attributes. Select add field, drop field, or rename field in the dropdown menu to get started.
Use add field to append a new key-value field to your log.
To set up the add field processor:
<OUTER_FIELD>.<INNER_FIELD>
. All values are stored as strings.
Note: If the field you want to add already exists, the Worker throws an error and the existing field remains unchanged.Use drop field to drop a field from logging data that matches the filter you specify below.
To set up the drop field processor:
<OUTER_FIELD>.<INNER_FIELD>
.
Note: If your specified key does not exist, your log will be unimpacted.Use rename field to rename a field within your log.
To set up the rename field processor:
<OUTER_FIELD>.<INNER_FIELD>
. Once renamed, your original field is deleted unless you enable the Preserve source tag checkbox described below.null
value is applied to your target.<OUTER_FIELD>.<INNER_FIELD>
.For the following message structure, use outer_key.inner_key.double_inner_key
to refer to the key with the value double_inner_value
.
{
"outer_key": {
"inner_key": "inner_value",
"a": {
"double_inner_key": "double_inner_value",
"b": "b value"
},
"c": "c value"
},
"d": "d value"
}
This processor samples your logging traffic for a representative subset at the rate that you define, dropping the remaining logs. As an example, you can use this processor to sample 20% of logs from a noisy non-critical service.
The sampling only applies to logs that match your filter query and does not impact other logs. If a log is dropped at this processor, none of the processors below receives that log.
To set up the sample processor:
2
means 2% of logs are retained out of all the logs that match the filter query.This processor parses logs using the grok parsing rules that are available for a set of sources. The rules are automatically applied to logs based on the log source. Therefore, logs must have a source
field with the source name. If this field is not added when the log is sent to the Observability Pipelines Worker, you can use the Add field processor to add it.
If the source
field of a log matches one of the grok parsing rule sets, the log’s message
field is checked against those rules. If a rule matches, the resulting parsed data is added in the message
field as a JSON object, overwriting the original message
.
If there isn’t a source
field on the log, or no rule matches the log message
, then no changes are made to the log and it is sent to the next step in the pipeline.
To set up the grok parser:
The quota processor measures the logging traffic for logs that match the filter you specify. When the configured daily quota is met inside the 24-hour rolling window, the processor can either drop additional logs or send an alert using a Datadog monitor. You can configure the processor to track the total volume or the total number of events.
As an example, you can configure this processor to drop new logs or trigger an alert without dropping logs after the processor has received 10 million events from a certain service in the last 24 hours.
To set up the quota processor:
Events
or by the Volume
in bytes.The reduce processor groups multiple log events into a single log, based on the fields specified and the merge strategies selected. Logs are grouped at 10-second intervals. After the interval has elapsed for the group, the reduced log for that group is sent to the next step in the pipeline.
To set up the reduce processor:
These are the available merge strategies for combining log events.
Name | Description |
---|---|
Array | Appends each value to an array. |
Concat | Concatenates each string value, delimited with a space. |
Concat newline | Concatenates each string value, delimited with a newline. |
Concat raw | Concatenates each string value, without a delimiter. |
Discard | Discards all values except the first value that was received. |
Flat unique | Creates a flattened array of all unique values that were received. |
Longest array | Keeps the longest array that was received. |
Max | Keeps the maximum numeric value that was received. |
Min | Keeps the minimum numeric value that was received. |
Retain | Discards all values except the last value that was received. Works as a way to coalesce by not retaining `null`. |
Shortest array | Keeps the shortest array that was received. |
Sum | Sums all numeric values that were received. |
The deduplicate processor removes copies of data to reduce volume and noise. It caches 5,000 messages at a time and compares your incoming logs traffic against the cached messages. For example, this processor can be used to keep only unique warning logs in the case where multiple identical warning logs are sent in succession.
To set up the deduplicate processor:
Match
on or Ignore
the fields specified below.Match
is selected, then after a log passes through, future logs that have the same values for all of the fields you specify below are removed.Ignore
is selected, then after a log passes through, future logs that have the same values for all of their fields, except the ones you specify below, are removed.<OUTER_FIELD>.<INNER_FIELD>
to match subfields. See the Path notation example below.For the following message structure, use outer_key.inner_key.double_inner_key
to refer to the key with the value double_inner_value
.
{
"outer_key": {
"inner_key": "inner_value",
"a": {
"double_inner_key": "double_inner_value",
"b": "b value"
},
"c": "c value"
},
"d": "d value"
}
The Sensitive Data Scanner processor scans logs to detect and redact or hash sensitive information such as PII, PCI, and custom sensitive data. You can pick from our library of predefined rules, or input custom Regex rules to scan for sensitive data.
To set up the sensitive data scanner processor:
outer_key.inner_key
) to access nested keys. For specified attributes with nested data, all nested data is scanned.outer_key.inner_key
) to access nested keys. For specified attributes with nested data, all nested data is excluded.This processor adds a field with the name of the host that sent the log. For example, hostname: 613e197f3526
. Note: If the hostname
already exists, the Worker throws an error and does not overwrite the existing hostname
.
To set up this processor:
This processor converts the specified field into JSON objects.
To set up this processor:
Use this processor to enrich your logs with information from a reference table, which could be a local file or database.
To set up the enrichment table processor:
For this example, merchant_id
is used as the source attribute and merchant_info
as the target attribute.
This is the example reference table that the enrichment processor uses:
merch_id | merchant_name | city | state |
---|---|---|---|
803 | Andy’s Ottomans | Boise | Idaho |
536 | Cindy’s Couches | Boulder | Colorado |
235 | Debra’s Benches | Las Vegas | Nevada |
merch_id
is set as the column name the processor uses to find the source attribute’s value. Note: The source attribute’s value does not have to match the column name.
If the enrichment processor receives a log with "merchant_id":"536"
:
536
in the reference table’s merch_id
column.merchant_info
attribute as a JSON object:merchant_info {
"merchant_name":"Cindy's Couches",
"city":"Boulder",
"state":"Colorado"
}
Select your platform in the Choose your installation platform dropdown menu.
Enter the Syslog address. This is a Syslog-compatible endpoint, exposed by the Worker, that your applications send logs to. The Observability Pipelines Worker listens on this address for incoming logs.
Provide the environment variables for each of your selected destinations. See prerequisites for more information.
There are no environment variables to configure for Datadog Log Management.
Enter your Splunk HEC token and the base URL of the Splunk instance. See prerequisites for more information.
The Worker passes the HEC token to the Splunk collection endpoint. After the Observability Pipelines Worker processes the logs, it sends the logs to the specified Splunk instance URL.
Note: The Splunk HEC destination forwards all logs to the /services/collector/event
endpoint regardless of whether you configure your Splunk HEC destination to encode your output in JSON
or raw
.
Enter the Sumo Logic HTTP collector URL. See prerequisites for more information.
Enter the rsyslog or syslog-ng endpoint URL. For example, 127.0.0.1:9997
. The Observability Pipelines Worker sends logs to this address and port.
Enter the Google Chronicle endpoint URL. For example, https://chronicle.googleapis.com
.
http://CLUSTER_ID.LOCAL_HOST_IP.ip.es.io:9200
.http://<hostname.IP>:9200
.http://<hostname.IP>:9200
.Follow the instructions for your environment to install the Worker.
docker run -i -e DD_API_KEY=<DATADOG_API_KEY> \
-e DD_OP_PIPELINE_ID=<PIPELINE_ID> \
-e DD_SITE=<DATADOG_SITE> \
-e <SOURCE_ENV_VARIABLE> \
-e <DESTINATION_ENV_VARIABLE> \
-p 8088:8088 \
datadog/observability-pipelines-worker run
docker run
command exposes the same port the Worker is listening on. If you want to map the Worker’s container port to a different port on the Docker host, use the -p | --publish
option in the command:-p 8282:8088 datadog/observability-pipelines-worker run
See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.
helm repo add datadog https://helm.datadoghq.com
helm repo update
helm upgrade --install opw \
-f aws_eks.yaml \
--set datadog.apiKey=<DATADOG_API_KEY> \
--set datadog.pipelineId=<PIPELINE_ID> \
--set <SOURCE_ENV_VARIABLES> \
--set <DESTINATION_ENV_VARIABLES> \
--set service.ports[0].protocol=TCP,service.ports[0].port=<SERVICE_PORT>,service.ports[0].targetPort=<TARGET_PORT> \
datadog/observability-pipelines-worker
<SERVICE_PORT>
to the port the Worker is listening on (<TARGET_PORT>
). If you want to map the Worker’s pod port to a different incoming port of the Kubernetes Service, use the following service.ports[0].port
and service.ports[0].targetPort
values in the command:--set service.ports[0].protocol=TCP,service.ports[0].port=8088,service.ports[0].targetPort=8282
See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.
helm repo add datadog https://helm.datadoghq.com
helm repo update
helm upgrade --install opw \
-f azure_aks.yaml \
--set datadog.apiKey=<DATADOG_API_KEY> \
--set datadog.pipelineId=<PIPELINE_ID> \
--set <SOURCE_ENV_VARIABLES> \
--set <DESTINATION_ENV_VARIABLES> \
--set service.ports[0].protocol=TCP,service.ports[0].port=<SERVICE_PORT>,service.ports[0].targetPort=<TARGET_PORT> \
datadog/observability-pipelines-worker
<SERVICE_PORT>
to the port the Worker is listening on (<TARGET_PORT>
). If you want to map the Worker’s pod port to a different incoming port of the Kubernetes Service, use the following service.ports[0].port
and service.ports[0].targetPort
values in the command:--set service.ports[0].protocol=TCP,service.ports[0].port=8088,service.ports[0].targetPort=8282
See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.
helm repo add datadog https://helm.datadoghq.com
helm repo update
helm upgrade --install opw \
-f google_gke.yaml \
--set datadog.apiKey=<DATADOG_API_KEY> \
--set datadog.pipelineId=<PIPELINE_ID> \
--set <SOURCE_ENV_VARIABLES> \
--set <DESTINATION_ENV_VARIABLES> \
--set service.ports[0].protocol=TCP,service.ports[0].port=<SERVICE_PORT>,service.ports[0].targetPort=<TARGET_PORT> \
datadog/observability-pipelines-worker
<SERVICE_PORT>
to the port the Worker is listening on (<TARGET_PORT>
). If you want to map the Worker’s pod port to a different incoming port of the Kubernetes Service, use the following service.ports[0].port
and service.ports[0].targetPort
values in the command:--set service.ports[0].protocol=TCP,service.ports[0].port=8088,service.ports[0].targetPort=8282
See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.
Click Select API key to choose the Datadog API key you want to use.
Run the one-step command provided in the UI to install the Worker.
Note: The environment variables used by the Worker in /etc/default/observability-pipelines-worker
are not updated on subsequent runs of the install script. If changes are needed, update the file manually and restart the Worker.
If you prefer not to use the one-line installation script, follow these step-by-step instructions:
sudo apt-get update
sudo apt-get install apt-transport-https curl gnupg
deb
repo on your system and create a Datadog archive keyring:sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable observability-pipelines-worker-2' > /etc/apt/sources.list.d/datadog-observability-pipelines-worker.list"
sudo touch /usr/share/keyrings/datadog-archive-keyring.gpg
sudo chmod a+r /usr/share/keyrings/datadog-archive-keyring.gpg
curl https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
curl https://keys.datadoghq.com/DATADOG_APT_KEY_06462314.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
curl https://keys.datadoghq.com/DATADOG_APT_KEY_F14F620E.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
curl https://keys.datadoghq.com/DATADOG_APT_KEY_C0962C7D.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
apt
repo and install the Worker:sudo apt-get update
sudo apt-get install observability-pipelines-worker datadog-signing-keys
datadoghq.com
for US1), source, and destination environment variables to the Worker’s environment file:sudo cat <<EOF > /etc/default/observability-pipelines-worker
DD_API_KEY=<DATADOG_API_KEY>
DD_OP_PIPELINE_ID=<PIPELINE_ID>
DD_SITE=<DATADOG_SITE>
<SOURCE_ENV_VARIABLES>
<DESTINATION_ENV_VARIABLES>
EOF
sudo systemctl restart observability-pipelines-worker
See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.
Click Select API key to choose the Datadog API key you want to use.
Run the one-step command provided in the UI to install the Worker.
Note: The environment variables used by the Worker in /etc/default/observability-pipelines-worker
are not updated on subsequent runs of the install script. If changes are needed, update the file manually and restart the Worker.
If you prefer not to use the one-line installation script, follow these step-by-step instructions:
rpm
repo on your system with the below command. Note: If you are running RHEL 8.1 or CentOS 8.1, use repo_gpgcheck=0
instead of repo_gpgcheck=1
in the configuration below.cat <<EOF > /etc/yum.repos.d/datadog-observability-pipelines-worker.repo
[observability-pipelines-worker]
name = Observability Pipelines Worker
baseurl = https://yum.datadoghq.com/stable/observability-pipelines-worker-2/\$basearch/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://keys.datadoghq.com/DATADOG_RPM_KEY_CURRENT.public
https://keys.datadoghq.com/DATADOG_RPM_KEY_B01082D3.public
EOF
sudo yum makecache
sudo yum install observability-pipelines-worker
datadoghq.com
for US1), source, and destination environment variables to the Worker’s environment file:sudo cat <<-EOF > /etc/default/observability-pipelines-worker
DD_API_KEY=<API_KEY>
DD_OP_PIPELINE_ID=<PIPELINE_ID>
DD_SITE=<SITE>
<SOURCE_ENV_VARIABLES>
<DESTINATION_ENV_VARIABLES>
EOF
sudo systemctl restart observability-pipelines-worker
See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.
Select one of the options in the dropdown to provide the expected log volume for the pipeline:
Option | Description |
---|---|
Unsure | Use this option if you are not able to project the log volume or you want to test the Worker. This option provisions the EC2 Auto Scaling group with a maximum of 2 general purpose t4g.large instances. |
1-5 TB/day | This option provisions the EC2 Auto Scaling group with a maximum of 2 compute optimized instances c6g.large . |
5-10 TB/day | This option provisions the EC2 Auto Scaling group with a minimum of 2 and a maximum of 5 compute optimized c6g.large instances. |
>10 TB/day | Datadog recommends this option for large-scale production deployments. It provisions the EC2 Auto Scaling group with a minimum of 2 and a maximum of 10 compute optimized c6g.xlarge instances. |
Note: All other parameters are set to reasonable defaults for a Worker deployment, but you can adjust them for your use case as needed in the AWS Console before creating the stack.
Select the AWS region you want to use to install the Worker.
Click Select API key to choose the Datadog API key you want to use.
Click Launch CloudFormation Template to navigate to the AWS Console to review the stack configuration and then launch it. Make sure the CloudFormation parameters are as expected.
Select the VPC and subnet you want to use to install the Worker.
Review and check the necessary permissions checkboxes for IAM. Click Submit to create the stack. CloudFormation handles the installation at this point; the Worker instances are launched, the necessary software is downloaded, and the Worker starts automatically.
Navigate back to the Observability Pipelines installation page and click Deploy.
See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.
To send rsyslog logs to the Observability Pipelines Worker, update your rsyslog config file:
ruleset(name="infiles") {
action(type="omfwd" protocol="tcp" target="<OPW_HOST>" port="<OPW_PORT>")
}
<OPW_HOST>
is the IP/URL of the host (or load balancer) associated with the Observability Pipelines Worker.
LoadBalancerDNS
CloudFormation output has the correct URL to use.opw-observability-pipelines-worker.default.svc.cluster.local
.To send syslog-ng logs to the Observability Pipelines Worker, update your syslog-ng config file:
destination obs_pipelines {
http(
url("<OPW_HOST>")
method("POST")
body("<${PRI}>1 ${ISODATE} ${HOST:--} ${PROGRAM:--} ${PID:--} ${MSGID:--} ${SDATA:--} $MSG\n")
);
};
<OPW_HOST>
is the IP/URL of the host (or load balancer) associated with the Observability Pipelines Worker.
LoadBalancerDNS
CloudFormation output has the correct URL to use.opw-observability-pipelines-worker.default.svc.cluster.local
.