gcp_dataplex_data_scan
ancestors
Type: UNORDERED_LIST_STRING
create_time
Type: TIMESTAMP
Provider name: createTime
Description: Output only. The time when the scan was created.
data
Type: STRUCT
Provider name: data
Description: Required. The data source for DataScan.
entity
Type: STRING
Provider name: entity
Description: Immutable. The Dataplex entity that represents the data source (e.g. BigQuery table) for DataScan, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.
resource
Type: STRING
Provider name: resource
Description: Immutable. The service-qualified full resource name of the cloud resource for a DataScan job to scan against. The field could be: BigQuery table of type “TABLE” for DataProfileScan/DataQualityScan Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
data_discovery_result
Type: STRUCT
Provider name: dataDiscoveryResult
Description: Output only. The result of a data discovery scan.
bigquery_publishing
Type: STRUCT
Provider name: bigqueryPublishing
Description: Output only. Configuration for metadata publishing.
dataset
Type: STRING
Provider name: dataset
Description: Output only. The BigQuery dataset the discovered tables are published to.
location
Type: STRING
Provider name: location
Description: Output only. The location of the BigQuery publishing dataset.
scan_statistics
Type: STRUCT
Provider name: scanStatistics
Description: Output only. Statistics of the DataDiscoveryScan.
data_processed_bytes
Type: INT64
Provider name: dataProcessedBytes
Description: The data processed in bytes.
files_excluded
Type: INT32
Provider name: filesExcluded
Description: The number of files excluded.
filesets_created
Type: INT32
Provider name: filesetsCreated
Description: The number of filesets created.
filesets_deleted
Type: INT32
Provider name: filesetsDeleted
Description: The number of filesets deleted.
filesets_updated
Type: INT32
Provider name: filesetsUpdated
Description: The number of filesets updated.
scanned_file_count
Type: INT32
Provider name: scannedFileCount
Description: The number of files scanned.
tables_created
Type: INT32
Provider name: tablesCreated
Description: The number of tables created.
tables_deleted
Type: INT32
Provider name: tablesDeleted
Description: The number of tables deleted.
tables_updated
Type: INT32
Provider name: tablesUpdated
Description: The number of tables updated.
data_discovery_spec
Type: STRUCT
Provider name: dataDiscoverySpec
Description: Settings for a data discovery scan.
bigquery_publishing_config
Type: STRUCT
Provider name: bigqueryPublishingConfig
Description: Optional. Configuration for metadata publishing.
connection
Type: STRING
Provider name: connection
Description: Optional. The BigQuery connection used to create BigLake tables. Must be in the form projects/{project_id}/locations/{location_id}/connections/{connection_id}
location
Type: STRING
Provider name: location
Description: Optional. The location of the BigQuery dataset to publish BigLake external or non-BigLake external tables to. 1. If the Cloud Storage bucket is located in a multi-region bucket, then BigQuery dataset can be in the same multi-region bucket or any single region that is included in the same multi-region bucket. The datascan can be created in any single region that is included in the same multi-region bucket 2. If the Cloud Storage bucket is located in a dual-region bucket, then BigQuery dataset can be located in regions that are included in the dual-region bucket, or in a multi-region that includes the dual-region. The datascan can be created in any single region that is included in the same dual-region bucket. 3. If the Cloud Storage bucket is located in a single region, then BigQuery dataset can be in the same single region or any multi-region bucket that includes the same single region. The datascan will be created in the same single region as the bucket. 4. If the BigQuery dataset is in single region, it must be in the same single region as the datascan.For supported values, refer to https://cloud.google.com/bigquery/docs/locations#supported_locations.
table_type
Type: STRING
Provider name: tableType
Description: Optional. Determines whether to publish discovered tables as BigLake external tables or non-BigLake external tables.
Possible values:
TABLE_TYPE_UNSPECIFIED
- Table type unspecified.
EXTERNAL
- Default. Discovered tables are published as BigQuery external tables whose data is accessed using the credentials of the user querying the table.
BIGLAKE
- Discovered tables are published as BigLake external tables whose data is accessed using the credentials of the associated BigQuery connection.
storage_config
Type: STRUCT
Provider name: storageConfig
Description: Cloud Storage related configurations.
csv_options
Type: STRUCT
Provider name: csvOptions
Description: Optional. Configuration for CSV data.
delimiter
Type: STRING
Provider name: delimiter
Description: Optional. The delimiter that is used to separate values. The default is , (comma).
encoding
Type: STRING
Provider name: encoding
Description: Optional. The character encoding of the data. The default is UTF-8.
header_rows
Type: INT32
Provider name: headerRows
Description: Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.
quote
Type: STRING
Provider name: quote
Description: Optional. The character used to quote column values. Accepts " (double quotation mark) or ’ (single quotation mark). If unspecified, defaults to " (double quotation mark).
type_inference_disabled
Type: BOOLEAN
Provider name: typeInferenceDisabled
Description: Optional. Whether to disable the inference of data types for CSV data. If true, all columns are registered as strings.
exclude_patterns
Type: UNORDERED_LIST_STRING
Provider name: excludePatterns
Description: Optional. Defines the data to exclude during discovery. Provide a list of patterns that identify the data to exclude. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.
include_patterns
Type: UNORDERED_LIST_STRING
Provider name: includePatterns
Description: Optional. Defines the data to include during discovery when only a subset of the data should be considered. Provide a list of patterns that identify the data to include. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.
json_options
Type: STRUCT
Provider name: jsonOptions
Description: Optional. Configuration for JSON data.
encoding
Type: STRING
Provider name: encoding
Description: Optional. The character encoding of the data. The default is UTF-8.
type_inference_disabled
Type: BOOLEAN
Provider name: typeInferenceDisabled
Description: Optional. Whether to disable the inference of data types for JSON data. If true, all columns are registered as their primitive types (strings, number, or boolean).
data_profile_result
Type: STRUCT
Provider name: dataProfileResult
Description: Output only. The result of a data profile scan.
post_scan_actions_result
Type: STRUCT
Provider name: postScanActionsResult
Description: Output only. The result of post scan actions.
bigquery_export_result
Type: STRUCT
Provider name: bigqueryExportResult
Description: Output only. The result of BigQuery export post scan action.
message
Type: STRING
Provider name: message
Description: Output only. Additional information about the BigQuery exporting.
state
Type: STRING
Provider name: state
Description: Output only. Execution state for the BigQuery exporting.
Possible values:
STATE_UNSPECIFIED
- The exporting state is unspecified.
SUCCEEDED
- The exporting completed successfully.
FAILED
- The exporting is no longer running due to an error.
SKIPPED
- The exporting is skipped due to no valid scan result to export (usually caused by scan failed).
profile
Type: STRUCT
Provider name: profile
Description: The profile information per field.
fields
Type: UNORDERED_LIST_STRUCT
Provider name: fields
Description: List of fields with structural and profile information for each field.
mode
Type: STRING
Provider name: mode
Description: The mode of the field. Possible values include: REQUIRED, if it is a required field. NULLABLE, if it is an optional field. REPEATED, if it is a repeated field.
name
Type: STRING
Provider name: name
Description: The name of the field.
profile
Type: STRUCT
Provider name: profile
Description: Profile information for the corresponding field.
distinct_ratio
Type: DOUBLE
Provider name: distinctRatio
Description: Ratio of rows with distinct values against total scanned rows. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.
double_profile
Type: STRUCT
Provider name: doubleProfile
Description: Double type field information.
average
Type: DOUBLE
Provider name: average
Description: Average of non-null values in the scanned data. NaN, if the field has a NaN.
max
Type: DOUBLE
Provider name: max
Description: Maximum of non-null values in the scanned data. NaN, if the field has a NaN.
min
Type: DOUBLE
Provider name: min
Description: Minimum of non-null values in the scanned data. NaN, if the field has a NaN.
quartiles
Type: UNORDERED_LIST_DOUBLE
Provider name: quartiles
Description: A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of quartile values for the scanned data, occurring in order Q1, median, Q3.
standard_deviation
Type: DOUBLE
Provider name: standardDeviation
Description: Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.
integer_profile
Type: STRUCT
Provider name: integerProfile
Description: Integer type field information.
average
Type: DOUBLE
Provider name: average
Description: Average of non-null values in the scanned data. NaN, if the field has a NaN.
max
Type: INT64
Provider name: max
Description: Maximum of non-null values in the scanned data. NaN, if the field has a NaN.
min
Type: INT64
Provider name: min
Description: Minimum of non-null values in the scanned data. NaN, if the field has a NaN.
quartiles
Type: UNORDERED_LIST_INT64
Provider name: quartiles
Description: A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of approximate quartile values for the scanned data, occurring in order Q1, median, Q3.
standard_deviation
Type: DOUBLE
Provider name: standardDeviation
Description: Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.
null_ratio
Type: DOUBLE
Provider name: nullRatio
Description: Ratio of rows with null value against total scanned rows.
string_profile
Type: STRUCT
Provider name: stringProfile
Description: String type field information.
average_length
Type: DOUBLE
Provider name: averageLength
Description: Average length of non-null values in the scanned data.
max_length
Type: INT64
Provider name: maxLength
Description: Maximum length of non-null values in the scanned data.
min_length
Type: INT64
Provider name: minLength
Description: Minimum length of non-null values in the scanned data.
top_n_values
Type: UNORDERED_LIST_STRUCT
Provider name: topNValues
Description: The list of top N non-null values, frequency and ratio with which they occur in the scanned data. N is 10 or equal to the number of distinct values in the field, whichever is smaller. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.
count
Type: INT64
Provider name: count
Description: Count of the corresponding value in the scanned data.
ratio
Type: DOUBLE
Provider name: ratio
Description: Ratio of the corresponding value in the field against the total number of rows in the scanned data.
value
Type: STRING
Provider name: value
Description: String value of a top N non-null value.
type
Type: STRING
Provider name: type
Description: The data type retrieved from the schema of the data source. For instance, for a BigQuery native table, it is the BigQuery Table Schema (https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#tablefieldschema). For a Dataplex Entity, it is the Entity Schema (https://cloud.google.com/dataplex/docs/reference/rpc/google.cloud.dataplex.v1#type_3).
row_count
Type: INT64
Provider name: rowCount
Description: The count of rows scanned.
scanned_data
Type: STRUCT
Provider name: scannedData
Description: The data scanned for this result.
incremental_field
Type: STRUCT
Provider name: incrementalField
Description: The range denoted by values of an incremental field
end
Type: STRING
Provider name: end
Description: Value that marks the end of the range.
field
Type: STRING
Provider name: field
Description: The field that contains values which monotonically increases over time (e.g. a timestamp column).
start
Type: STRING
Provider name: start
Description: Value that marks the start of the range.
data_profile_spec
Type: STRUCT
Provider name: dataProfileSpec
Description: Settings for a data profile scan.
exclude_fields
Type: STRUCT
Provider name: excludeFields
Description: Optional. The fields to exclude from data profile.If specified, the fields will be excluded from data profile, regardless of include_fields value.
field_names
Type: UNORDERED_LIST_STRING
Provider name: fieldNames
Description: Optional. Expected input is a list of fully qualified names of fields as in the schema.Only top-level field names for nested fields are supported. For instance, if ‘x’ is of nested field type, listing ‘x’ is supported but ‘x.y.z’ is not supported. Here ‘y’ and ‘y.z’ are nested fields of ‘x’.
include_fields
Type: STRUCT
Provider name: includeFields
Description: Optional. The fields to include in data profile.If not specified, all fields at the time of profile scan job execution are included, except for ones listed in exclude_fields.
field_names
Type: UNORDERED_LIST_STRING
Provider name: fieldNames
Description: Optional. Expected input is a list of fully qualified names of fields as in the schema.Only top-level field names for nested fields are supported. For instance, if ‘x’ is of nested field type, listing ‘x’ is supported but ‘x.y.z’ is not supported. Here ‘y’ and ‘y.z’ are nested fields of ‘x’.
post_scan_actions
Type: STRUCT
Provider name: postScanActions
Description: Optional. Actions to take upon job completion..
bigquery_export
Type: STRUCT
Provider name: bigqueryExport
Description: Optional. If set, results will be exported to the provided BigQuery table.
results_table
Type: STRING
Provider name: resultsTable
Description: Optional. The BigQuery table to export DataProfileScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID or projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
row_filter
Type: STRING
Provider name: rowFilter
Description: Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in GoogleSQL syntax (https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#where_clause).Example: col1 >= 0 AND col2 < 10
sampling_percent
Type: FLOAT
Provider name: samplingPercent
Description: Optional. The percentage of the records to be selected from the dataset for DataScan. Value can range between 0.0 and 100.0 with up to 3 significant decimal digits. Sampling is not applied if sampling_percent is not specified, 0 or 100.
data_quality_result
Type: STRUCT
Provider name: dataQualityResult
Description: Output only. The result of a data quality scan.
columns
Type: UNORDERED_LIST_STRUCT
Provider name: columns
Description: Output only. A list of results at the column level.A column will have a corresponding DataQualityColumnResult if and only if there is at least one rule with the ‘column’ field set to it.
column
Type: STRING
Provider name: column
Description: Output only. The column specified in the DataQualityRule.
score
Type: FLOAT
Provider name: score
Description: Output only. The column-level data quality score for this data scan job if and only if the ‘column’ field is set.The score ranges between between 0, 100 (up to two decimal points).
dimensions
Type: UNORDERED_LIST_STRUCT
Provider name: dimensions
Description: Output only. A list of results at the dimension level.A dimension will have a corresponding DataQualityDimensionResult if and only if there is at least one rule with the ‘dimension’ field set to it.
dimension
Type: STRUCT
Provider name: dimension
Description: Output only. The dimension config specified in the DataQualitySpec, as is.
name
Type: STRING
Provider name: name
Description: Optional. The dimension name a rule belongs to. Custom dimension name is supported with all uppercase letters and maximum length of 30 characters.
passed
Type: BOOLEAN
Provider name: passed
Description: Output only. Whether the dimension passed or failed.
score
Type: FLOAT
Provider name: score
Description: Output only. The dimension-level data quality score for this data scan job if and only if the ‘dimension’ field is set.The score ranges between 0, 100 (up to two decimal points).
passed
Type: BOOLEAN
Provider name: passed
Description: Output only. Overall data quality result – true if all rules passed.
post_scan_actions_result
Type: STRUCT
Provider name: postScanActionsResult
Description: Output only. The result of post scan actions.
bigquery_export_result
Type: STRUCT
Provider name: bigqueryExportResult
Description: Output only. The result of BigQuery export post scan action.
message
Type: STRING
Provider name: message
Description: Output only. Additional information about the BigQuery exporting.
state
Type: STRING
Provider name: state
Description: Output only. Execution state for the BigQuery exporting.
Possible values:
STATE_UNSPECIFIED
- The exporting state is unspecified.
SUCCEEDED
- The exporting completed successfully.
FAILED
- The exporting is no longer running due to an error.
SKIPPED
- The exporting is skipped due to no valid scan result to export (usually caused by scan failed).
row_count
Type: INT64
Provider name: rowCount
Description: Output only. The count of rows processed.
rules
Type: UNORDERED_LIST_STRUCT
Provider name: rules
Description: Output only. A list of all the rules in a job, and their results.
assertion_row_count
Type: INT64
Provider name: assertionRowCount
Description: Output only. The number of rows returned by the SQL statement in a SQL assertion rule.This field is only valid for SQL assertion rules.
evaluated_count
Type: INT64
Provider name: evaluatedCount
Description: Output only. The number of rows a rule was evaluated against.This field is only valid for row-level type rules.Evaluated count can be configured to either include all rows (default) - with null rows automatically failing rule evaluation, or exclude null rows from the evaluated_count, by setting ignore_nulls = true.This field is not set for rule SqlAssertion.
failing_rows_query
Type: STRING
Provider name: failingRowsQuery
Description: Output only. The query to find rows that did not pass this rule.This field is only valid for row-level type rules.
null_count
Type: INT64
Provider name: nullCount
Description: Output only. The number of rows with null values in the specified column.
pass_ratio
Type: DOUBLE
Provider name: passRatio
Description: Output only. The ratio of passed_count / evaluated_count.This field is only valid for row-level type rules.
passed
Type: BOOLEAN
Provider name: passed
Description: Output only. Whether the rule passed or failed.
passed_count
Type: INT64
Provider name: passedCount
Description: Output only. The number of rows which passed a rule evaluation.This field is only valid for row-level type rules.This field is not set for rule SqlAssertion.
rule
Type: STRUCT
Provider name: rule
Description: Output only. The rule specified in the DataQualitySpec, as is.
column
Type: STRING
Provider name: column
Description: Optional. The unnested column which this rule is evaluated against.
description
Type: STRING
Provider name: description
Description: Optional. Description of the rule. The maximum length is 1,024 characters.
dimension
Type: STRING
Provider name: dimension
Description: Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are “COMPLETENESS”, “ACCURACY”, “CONSISTENCY”, “VALIDITY”, “UNIQUENESS”, “FRESHNESS”, “VOLUME”
ignore_null
Type: BOOLEAN
Provider name: ignoreNull
Description: Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.This field is only valid for the following type of rules: RangeExpectation RegexExpectation SetExpectation UniquenessExpectation
name
Type: STRING
Provider name: name
Description: Optional. A mutable name for the rule. The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-). The maximum length is 63 characters. Must start with a letter. Must end with a number or a letter.
non_null_expectation
Type: STRUCT
Provider name: nonNullExpectation
Description: Row-level rule which evaluates whether each column value is null.
range_expectation
Type: STRUCT
Provider name: rangeExpectation
Description: Row-level rule which evaluates whether each column value lies between a specified range.
max_value
Type: STRING
Provider name: maxValue
Description: Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
min_value
Type: STRING
Provider name: minValue
Description: Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
strict_max_enabled
Type: BOOLEAN
Provider name: strictMaxEnabled
Description: Optional. Whether each value needs to be strictly lesser than (’<’) the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
strict_min_enabled
Type: BOOLEAN
Provider name: strictMinEnabled
Description: Optional. Whether each value needs to be strictly greater than (’>’) the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
regex_expectation
Type: STRUCT
Provider name: regexExpectation
Description: Row-level rule which evaluates whether each column value matches a specified regex.
regex
Type: STRING
Provider name: regex
Description: Optional. A regular expression the column value is expected to match.
row_condition_expectation
Type: STRUCT
Provider name: rowConditionExpectation
Description: Row-level rule which evaluates whether each row in a table passes the specified condition.
sql_expression
Type: STRING
Provider name: sqlExpression
Description: Optional. The SQL expression.
set_expectation
Type: STRUCT
Provider name: setExpectation
Description: Row-level rule which evaluates whether each column value is contained by a specified set.
values
Type: UNORDERED_LIST_STRING
Provider name: values
Description: Optional. Expected values for the column value.
sql_assertion
Type: STRUCT
Provider name: sqlAssertion
Description: Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.
sql_statement
Type: STRING
Provider name: sqlStatement
Description: Optional. The SQL statement.
statistic_range_expectation
Type: STRUCT
Provider name: statisticRangeExpectation
Description: Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.
max_value
Type: STRING
Provider name: maxValue
Description: Optional. The maximum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
min_value
Type: STRING
Provider name: minValue
Description: Optional. The minimum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
statistic
Type: STRING
Provider name: statistic
Description: Optional. The aggregate metric to evaluate.
Possible values:
STATISTIC_UNDEFINED
- Unspecified statistic type
MEAN
- Evaluate the column mean
MIN
- Evaluate the column min
MAX
- Evaluate the column max
strict_max_enabled
Type: BOOLEAN
Provider name: strictMaxEnabled
Description: Optional. Whether column statistic needs to be strictly lesser than (’<’) the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
strict_min_enabled
Type: BOOLEAN
Provider name: strictMinEnabled
Description: Optional. Whether column statistic needs to be strictly greater than (’>’) the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
suspended
Type: BOOLEAN
Provider name: suspended
Description: Optional. Whether the Rule is active or suspended. Default is false.
table_condition_expectation
Type: STRUCT
Provider name: tableConditionExpectation
Description: Aggregate rule which evaluates whether the provided expression is true for a table.
sql_expression
Type: STRING
Provider name: sqlExpression
Description: Optional. The SQL expression.
threshold
Type: DOUBLE
Provider name: threshold
Description: Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of 0.0, 1.0.0 indicates default value (i.e. 1.0).This field is only valid for row-level type rules.
uniqueness_expectation
Type: STRUCT
Provider name: uniquenessExpectation
Description: Row-level rule which evaluates whether each column value is unique.
scanned_data
Type: STRUCT
Provider name: scannedData
Description: Output only. The data scanned for this result.
incremental_field
Type: STRUCT
Provider name: incrementalField
Description: The range denoted by values of an incremental field
end
Type: STRING
Provider name: end
Description: Value that marks the end of the range.
field
Type: STRING
Provider name: field
Description: The field that contains values which monotonically increases over time (e.g. a timestamp column).
start
Type: STRING
Provider name: start
Description: Value that marks the start of the range.
score
Type: FLOAT
Provider name: score
Description: Output only. The overall data quality score.The score ranges between 0, 100 (up to two decimal points).
data_quality_spec
Type: STRUCT
Provider name: dataQualitySpec
Description: Settings for a data quality scan.
post_scan_actions
Type: STRUCT
Provider name: postScanActions
Description: Optional. Actions to take upon job completion.
bigquery_export
Type: STRUCT
Provider name: bigqueryExport
Description: Optional. If set, results will be exported to the provided BigQuery table.
results_table
Type: STRING
Provider name: resultsTable
Description: Optional. The BigQuery table to export DataQualityScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID or projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
notification_report
Type: STRUCT
Provider name: notificationReport
Description: Optional. If set, results will be sent to the provided notification receipts upon triggers.
job_end_trigger
Type: STRUCT
Provider name: jobEndTrigger
Description: Optional. If set, report will be sent when a scan job ends.
job_failure_trigger
Type: STRUCT
Provider name: jobFailureTrigger
Description: Optional. If set, report will be sent when a scan job fails.
recipients
Type: STRUCT
Provider name: recipients
Description: Required. The recipients who will receive the notification report.
emails
Type: UNORDERED_LIST_STRING
Provider name: emails
Description: Optional. The email recipients who will receive the DataQualityScan results report.
score_threshold_trigger
Type: STRUCT
Provider name: scoreThresholdTrigger
Description: Optional. If set, report will be sent when score threshold is met.
score_threshold
Type: FLOAT
Provider name: scoreThreshold
Description: Optional. The score range is in 0,100.
row_filter
Type: STRING
Provider name: rowFilter
Description: Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in GoogleSQL syntax (https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#where_clause).Example: col1 >= 0 AND col2 < 10
rules
Type: UNORDERED_LIST_STRUCT
Provider name: rules
Description: Required. The list of rules to evaluate against a data source. At least one rule is required.
column
Type: STRING
Provider name: column
Description: Optional. The unnested column which this rule is evaluated against.
description
Type: STRING
Provider name: description
Description: Optional. Description of the rule. The maximum length is 1,024 characters.
dimension
Type: STRING
Provider name: dimension
Description: Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are “COMPLETENESS”, “ACCURACY”, “CONSISTENCY”, “VALIDITY”, “UNIQUENESS”, “FRESHNESS”, “VOLUME”
ignore_null
Type: BOOLEAN
Provider name: ignoreNull
Description: Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.This field is only valid for the following type of rules: RangeExpectation RegexExpectation SetExpectation UniquenessExpectation
name
Type: STRING
Provider name: name
Description: Optional. A mutable name for the rule. The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-). The maximum length is 63 characters. Must start with a letter. Must end with a number or a letter.
non_null_expectation
Type: STRUCT
Provider name: nonNullExpectation
Description: Row-level rule which evaluates whether each column value is null.
range_expectation
Type: STRUCT
Provider name: rangeExpectation
Description: Row-level rule which evaluates whether each column value lies between a specified range.
max_value
Type: STRING
Provider name: maxValue
Description: Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
min_value
Type: STRING
Provider name: minValue
Description: Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
strict_max_enabled
Type: BOOLEAN
Provider name: strictMaxEnabled
Description: Optional. Whether each value needs to be strictly lesser than (’<’) the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
strict_min_enabled
Type: BOOLEAN
Provider name: strictMinEnabled
Description: Optional. Whether each value needs to be strictly greater than (’>’) the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
regex_expectation
Type: STRUCT
Provider name: regexExpectation
Description: Row-level rule which evaluates whether each column value matches a specified regex.
regex
Type: STRING
Provider name: regex
Description: Optional. A regular expression the column value is expected to match.
row_condition_expectation
Type: STRUCT
Provider name: rowConditionExpectation
Description: Row-level rule which evaluates whether each row in a table passes the specified condition.
sql_expression
Type: STRING
Provider name: sqlExpression
Description: Optional. The SQL expression.
set_expectation
Type: STRUCT
Provider name: setExpectation
Description: Row-level rule which evaluates whether each column value is contained by a specified set.
values
Type: UNORDERED_LIST_STRING
Provider name: values
Description: Optional. Expected values for the column value.
sql_assertion
Type: STRUCT
Provider name: sqlAssertion
Description: Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.
sql_statement
Type: STRING
Provider name: sqlStatement
Description: Optional. The SQL statement.
statistic_range_expectation
Type: STRUCT
Provider name: statisticRangeExpectation
Description: Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.
max_value
Type: STRING
Provider name: maxValue
Description: Optional. The maximum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
min_value
Type: STRING
Provider name: minValue
Description: Optional. The minimum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
statistic
Type: STRING
Provider name: statistic
Description: Optional. The aggregate metric to evaluate.
Possible values:
STATISTIC_UNDEFINED
- Unspecified statistic type
MEAN
- Evaluate the column mean
MIN
- Evaluate the column min
MAX
- Evaluate the column max
strict_max_enabled
Type: BOOLEAN
Provider name: strictMaxEnabled
Description: Optional. Whether column statistic needs to be strictly lesser than (’<’) the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
strict_min_enabled
Type: BOOLEAN
Provider name: strictMinEnabled
Description: Optional. Whether column statistic needs to be strictly greater than (’>’) the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
suspended
Type: BOOLEAN
Provider name: suspended
Description: Optional. Whether the Rule is active or suspended. Default is false.
table_condition_expectation
Type: STRUCT
Provider name: tableConditionExpectation
Description: Aggregate rule which evaluates whether the provided expression is true for a table.
sql_expression
Type: STRING
Provider name: sqlExpression
Description: Optional. The SQL expression.
threshold
Type: DOUBLE
Provider name: threshold
Description: Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of 0.0, 1.0.0 indicates default value (i.e. 1.0).This field is only valid for row-level type rules.
uniqueness_expectation
Type: STRUCT
Provider name: uniquenessExpectation
Description: Row-level rule which evaluates whether each column value is unique.
sampling_percent
Type: FLOAT
Provider name: samplingPercent
Description: Optional. The percentage of the records to be selected from the dataset for DataScan. Value can range between 0.0 and 100.0 with up to 3 significant decimal digits. Sampling is not applied if sampling_percent is not specified, 0 or 100.
description
Type: STRING
Provider name: description
Description: Optional. Description of the scan. Must be between 1-1024 characters.
execution_spec
Type: STRUCT
Provider name: executionSpec
Description: Optional. DataScan execution settings.If not specified, the fields in it will use their default values.
field
Type: STRING
Provider name: field
Description: Immutable. The unnested field (of type Date or Timestamp) that contains values which monotonically increase over time.If not specified, a data scan will run for all data in the table.
trigger
Type: STRUCT
Provider name: trigger
Description: Optional. Spec related to how often and when a scan should be triggered.If not specified, the default is OnDemand, which means the scan will not run until the user calls RunDataScan API.
execution_status
Type: STRUCT
Provider name: executionStatus
Description: Output only. Status of the data scan execution.
latest_job_create_time
Type: TIMESTAMP
Provider name: latestJobCreateTime
Description: Optional. The time when the DataScanJob execution was created.
latest_job_end_time
Type: TIMESTAMP
Provider name: latestJobEndTime
Description: Optional. The time when the latest DataScanJob ended.
latest_job_start_time
Type: TIMESTAMP
Provider name: latestJobStartTime
Description: Optional. The time when the latest DataScanJob started.
gcp_display_name
Type: STRING
Provider name: displayName
Description: Optional. User friendly display name. Must be between 1-256 characters.
labels
Type: UNORDERED_LIST_STRING
name
Type: STRING
Provider name: name
Description: Output only. Identifier. The relative resource name of the scan, of the form: projects/{project}/locations/{location_id}/dataScans/{datascan_id}, where project refers to a project_id or project_number and location_id refers to a GCP region.
organization_id
Type: STRING
parent
Type: STRING
project_id
Type: STRING
project_number
Type: STRING
resource_name
Type: STRING
state
Type: STRING
Provider name: state
Description: Output only. Current state of the DataScan.
Possible values:
STATE_UNSPECIFIED
- State is not specified.
ACTIVE
- Resource is active, i.e., ready to use.
CREATING
- Resource is under creation.
DELETING
- Resource is under deletion.
ACTION_REQUIRED
- Resource is active but has unresolved actions.
Type: UNORDERED_LIST_STRING
type
Type: STRING
Provider name: type
Description: Output only. The type of DataScan.
Possible values:
DATA_SCAN_TYPE_UNSPECIFIED
- The data scan type is unspecified.
DATA_QUALITY
- Data quality scan.
DATA_PROFILE
- Data profile scan.
DATA_DISCOVERY
- Data discovery scan.
uid
Type: STRING
Provider name: uid
Description: Output only. System generated globally unique ID for the scan. This ID will be different if the scan is deleted and re-created with the same name.
update_time
Type: TIMESTAMP
Provider name: updateTime
Description: Output only. The time when the scan was last updated.