Sensitive Data Scanner Processor

Docs > Observability Pipelines（観測データの制御） > プロセッサー > Sensitive Data Scanner Processor

このページは日本語には対応しておりません。随時翻訳に取り組んでいます。
翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください。

Sensitive Data Scanner プロセッサはログをスキャンして、PII、PCI、カスタムの機密データなどの機密情報を検出し、マスキングまたはハッシュ化します。機密データをスキャンするには、当社のライブラリから定義済みのルールを選択するか、カスタムの正規表現ルールを入力できます。

Sensitive Data Scanner プロセッサをセットアップするには

フィルタークエリを定義します。指定したフィルタークエリに一致するログのみがスキャンおよび処理されます。フィルタークエリに一致するかどうかにかかわらず、すべてのログはパイプラインの次のステップに送信されます。
Add Scanning Rule をクリックします。
スキャンルールに名前を付けます。
Select scanning rule type フィールドで、ライブラリからルールを作成するか、カスタムルールを作成するかを選択します。
- ライブラリからルールを作成する場合は、使用するライブラリパターンを選択します。
- カスタムルールを作成する場合は、データに対して確認する正規表現パターンを入力します。
Scan entire or part of event セクションで、ドロップダウンメニューの Entire Event (イベント全体)、Specific Attributes (特定の属性)、Exclude Attributes (属性の除外) からスキャン対象を選択します。
- Specific Attributes (特定の属性) を選択した場合は、Add Field をクリックし、スキャンする特定の属性を入力します。最大 3 つのフィールドを追加できます。ネストされたキーにアクセスするには、パス記法 (outer_key.inner_key) を使用します。ネストされたデータを持つ特定の属性では、すべてのネストされたデータがスキャンされます。
- Exclude Attributes (属性の除外) を選択した場合は、Add Field をクリックし、スキャンから除外する特定の属性を入力します。最大 3 つのフィールドを追加できます。ネストされたキーにアクセスするには、パス記法 (outer_key.inner_key) を使用します。ネストされたデータを持つ指定された属性については、すべてのネストされたデータが除外されます。
Define action on match セクションで、一致した情報に対して実行するアクションを選択します。注: マスキング、部分的なマスキング、およびハッシュ化はすべて元に戻せないアクションです。
- 情報をマスキングする場合は、一致したデータを置き換えるテキストを指定します。
- 情報を部分的にマスキングする場合は、マスキングする文字数を指定し、部分的なマスキングを一致したデータの先頭または末尾に適用するかどうかを指定します。
- 注: ハッシュ化を選択した場合、一致した UTF-8 バイトは FarmHash の 64 ビットフィンガープリントでハッシュ化されます。
オプションとして、正規表現に一致するすべてのイベントにタグを追加し、イベントのフィルタリング、分析、アラートを行うことができます。

Add rules from the library

In the dropdown menu, select the library rule you want to use.
Recommended keywords are automatically added based on the library rule selected. After the scanning rule has been added, you can add additional keywords or remove recommended keywords.
In the Define rule target and action section, select if you want to scan the Entire Event, Specific Attributes, or Exclude Attributes in the dropdown menu.
- If you are scanning the entire event, you can optionally exclude specific attributes from getting scanned. Use path notation (outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is excluded.
- If you are scanning specific attributes, specify which attributes you want to scan. Use path notation (outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is scanned.
For Define actions on match, select the action you want to take for the matched information. Note: Redaction, partial redaction, and hashing are all irreversible actions.
- Redact: Replaces all matching values with the text you specify in the Replacement text field.
- Partially Redact: Replaces a specified portion of all matched data. In the Redact section, specify the number of characters you want to redact and which part of the matched data to redact.
- Hash: Replaces all matched data with a unique identifier. The UTF-8 bytes of the match are hashed with the 64-bit fingerprint of FarmHash.
Optionally, click Add Field to add tags you want to associate with the matched events.
Add a name for the scanning rule.
Optionally, add a description for the rule.
Click Save.

Path notation example

For the following message structure, use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

Add additional keywords

After adding scanning rules from the library, you can edit each rule separately and add additional keywords to the keyword dictionary.

Navigate to your pipeline.
In the Sensitive Data Scanner processor with the rule you want to edit, click Manage Scanning Rules.
Toggle Use recommended keywords if you want the rule to use them. Otherwise, add your own keywords to the Create keyword dictionary field. You can also require that these keywords be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.
Click Update.

Add a custom rule

In the Define match conditions section, specify the regex pattern to use for matching against events in the Define the regex field. Enter sample data in the Add sample data field to verify that your regex pattern is valid. Sensitive Data Scanner supports Perl Compatible Regular Expressions (PCRE), but the following patterns are not supported:
- Backreferences and capturing sub-expressions (lookarounds)
- Arbitrary zero-width assertions
- Subroutine references and recursive patterns
- Conditional patterns
- Backtracking control verbs
- The \C “single-byte” directive (which breaks UTF-8 sequences)
- The \R newline match
- The \K start of match reset directive
- Callouts and embedded code
- Atomic grouping and possessive quantifiers
For Create keyword dictionary, add keywords to refine detection accuracy when matching regex conditions. For example, if you are scanning for a sixteen-digit Visa credit card number, you can add keywords like visa, credit, and card. You can also require that these keywords be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.
In the Define rule target and action section, select if you want to scan the Entire Event, Specific Attributes, or Exclude Attributes in the dropdown menu.
- If you are scanning the entire event, you can optionally exclude specific attributes from getting scanned. Use path notation (outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is excluded.
- If you are scanning specific attributes, specify which attributes you want to scan. Use path notation (outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is scanned.
For Define actions on match, select the action you want to take for the matched information. Note: Redaction, partial redaction, and hashing are all irreversible actions.
- Redact: Replaces all matching values with the text you specify in the Replacement text field.
- Partially Redact: Replaces a specified portion of all matched data. In the Redact section, specify the number of characters you want to redact and which part of the matched data to redact.
- Hash: Replaces all matched data with a unique identifier. The UTF-8 bytes of the match is hashed with the 64-bit fingerprint of FarmHash.
Optionally, click Add Field to add tags you want to associate with the matched events.
Add a name for the scanning rule.
Optionally, add a description for the rule.
Click Add Rule.

Path notation example

For the following message structure, use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

Filter query syntax

Each processor has a corresponding filter query in their fields. Processors only process logs that match their filter query. And for all processors except the filter processor, logs that do not match the query are sent to the next step of the pipeline. For the filter processor, logs that do not match the query are dropped.

For any attribute, tag, or key:value pair that is not a reserved attribute, your query must start with @. Conversely, to filter reserved attributes, you do not need to append @ in front of your filter query.

For example, to filter out and drop status:info logs, your filter can be set as NOT (status:info). To filter out and drop system-status:info, your filter must be set as NOT (@system-status:info).

Filter query examples:

NOT (status:debug): This filters for only logs that do not have the status DEBUG.
status:ok service:flask-web-app: This filters for all logs with the status OK from your flask-web-app service.
- This query can also be written as: status:ok AND service:flask-web-app.
host:COMP-A9JNGYK OR host:COMP-J58KAS: This filter query only matches logs from the labeled hosts.
@user.status:inactive: This filters for logs with the status inactive nested under the user attribute.

Queries run in the Observability Pipelines Worker are case sensitive. Learn more about writing filter queries in Datadog’s Log Search Syntax.