gcp_dataplex_data_scan

ancestors

Type: UNORDERED_LIST_STRING

create_time

Type: TIMESTAMP
Provider name: createTime
Description: Output only. The time when the scan was created.

data

Type: STRUCT
Provider name: data
Description: Required. The data source for DataScan.

  • entity
    Type: STRING
    Provider name: entity
    Description: Immutable. The Dataplex entity that represents the data source (e.g. BigQuery table) for DataScan, of the form: projects/{project_number}/locations/{location_id}/lakes/{lake_id}/zones/{zone_id}/entities/{entity_id}.
  • resource
    Type: STRING
    Provider name: resource
    Description: Immutable. The service-qualified full resource name of the cloud resource for a DataScan job to scan against. The field could be: BigQuery table of type “TABLE” for DataProfileScan/DataQualityScan Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID

data_discovery_result

Type: STRUCT
Provider name: dataDiscoveryResult
Description: Output only. The result of a data discovery scan.

  • bigquery_publishing
    Type: STRUCT
    Provider name: bigqueryPublishing
    Description: Output only. Configuration for metadata publishing.
    • dataset
      Type: STRING
      Provider name: dataset
      Description: Output only. The BigQuery dataset the discovered tables are published to.
    • location
      Type: STRING
      Provider name: location
      Description: Output only. The location of the BigQuery publishing dataset.
  • scan_statistics
    Type: STRUCT
    Provider name: scanStatistics
    Description: Output only. Statistics of the DataDiscoveryScan.
    • data_processed_bytes
      Type: INT64
      Provider name: dataProcessedBytes
      Description: The data processed in bytes.
    • files_excluded
      Type: INT32
      Provider name: filesExcluded
      Description: The number of files excluded.
    • filesets_created
      Type: INT32
      Provider name: filesetsCreated
      Description: The number of filesets created.
    • filesets_deleted
      Type: INT32
      Provider name: filesetsDeleted
      Description: The number of filesets deleted.
    • filesets_updated
      Type: INT32
      Provider name: filesetsUpdated
      Description: The number of filesets updated.
    • scanned_file_count
      Type: INT32
      Provider name: scannedFileCount
      Description: The number of files scanned.
    • tables_created
      Type: INT32
      Provider name: tablesCreated
      Description: The number of tables created.
    • tables_deleted
      Type: INT32
      Provider name: tablesDeleted
      Description: The number of tables deleted.
    • tables_updated
      Type: INT32
      Provider name: tablesUpdated
      Description: The number of tables updated.

data_discovery_spec

Type: STRUCT
Provider name: dataDiscoverySpec
Description: Settings for a data discovery scan.

  • bigquery_publishing_config
    Type: STRUCT
    Provider name: bigqueryPublishingConfig
    Description: Optional. Configuration for metadata publishing.
    • connection
      Type: STRING
      Provider name: connection
      Description: Optional. The BigQuery connection used to create BigLake tables. Must be in the form projects/{project_id}/locations/{location_id}/connections/{connection_id}
    • location
      Type: STRING
      Provider name: location
      Description: Optional. The location of the BigQuery dataset to publish BigLake external or non-BigLake external tables to. 1. If the Cloud Storage bucket is located in a multi-region bucket, then BigQuery dataset can be in the same multi-region bucket or any single region that is included in the same multi-region bucket. The datascan can be created in any single region that is included in the same multi-region bucket 2. If the Cloud Storage bucket is located in a dual-region bucket, then BigQuery dataset can be located in regions that are included in the dual-region bucket, or in a multi-region that includes the dual-region. The datascan can be created in any single region that is included in the same dual-region bucket. 3. If the Cloud Storage bucket is located in a single region, then BigQuery dataset can be in the same single region or any multi-region bucket that includes the same single region. The datascan will be created in the same single region as the bucket. 4. If the BigQuery dataset is in single region, it must be in the same single region as the datascan.For supported values, refer to https://cloud.google.com/bigquery/docs/locations#supported_locations.
    • table_type
      Type: STRING
      Provider name: tableType
      Description: Optional. Determines whether to publish discovered tables as BigLake external tables or non-BigLake external tables.
      Possible values:
      • TABLE_TYPE_UNSPECIFIED - Table type unspecified.
      • EXTERNAL - Default. Discovered tables are published as BigQuery external tables whose data is accessed using the credentials of the user querying the table.
      • BIGLAKE - Discovered tables are published as BigLake external tables whose data is accessed using the credentials of the associated BigQuery connection.
  • storage_config
    Type: STRUCT
    Provider name: storageConfig
    Description: Cloud Storage related configurations.
    • csv_options
      Type: STRUCT
      Provider name: csvOptions
      Description: Optional. Configuration for CSV data.
      • delimiter
        Type: STRING
        Provider name: delimiter
        Description: Optional. The delimiter that is used to separate values. The default is , (comma).
      • encoding
        Type: STRING
        Provider name: encoding
        Description: Optional. The character encoding of the data. The default is UTF-8.
      • header_rows
        Type: INT32
        Provider name: headerRows
        Description: Optional. The number of rows to interpret as header rows that should be skipped when reading data rows.
      • quote
        Type: STRING
        Provider name: quote
        Description: Optional. The character used to quote column values. Accepts " (double quotation mark) or ’ (single quotation mark). If unspecified, defaults to " (double quotation mark).
      • type_inference_disabled
        Type: BOOLEAN
        Provider name: typeInferenceDisabled
        Description: Optional. Whether to disable the inference of data types for CSV data. If true, all columns are registered as strings.
    • exclude_patterns
      Type: UNORDERED_LIST_STRING
      Provider name: excludePatterns
      Description: Optional. Defines the data to exclude during discovery. Provide a list of patterns that identify the data to exclude. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.
    • include_patterns
      Type: UNORDERED_LIST_STRING
      Provider name: includePatterns
      Description: Optional. Defines the data to include during discovery when only a subset of the data should be considered. Provide a list of patterns that identify the data to include. For Cloud Storage bucket assets, these patterns are interpreted as glob patterns used to match object names. For BigQuery dataset assets, these patterns are interpreted as patterns to match table names.
    • json_options
      Type: STRUCT
      Provider name: jsonOptions
      Description: Optional. Configuration for JSON data.
      • encoding
        Type: STRING
        Provider name: encoding
        Description: Optional. The character encoding of the data. The default is UTF-8.
      • type_inference_disabled
        Type: BOOLEAN
        Provider name: typeInferenceDisabled
        Description: Optional. Whether to disable the inference of data types for JSON data. If true, all columns are registered as their primitive types (strings, number, or boolean).

data_profile_result

Type: STRUCT
Provider name: dataProfileResult
Description: Output only. The result of a data profile scan.

  • post_scan_actions_result
    Type: STRUCT
    Provider name: postScanActionsResult
    Description: Output only. The result of post scan actions.
    • bigquery_export_result
      Type: STRUCT
      Provider name: bigqueryExportResult
      Description: Output only. The result of BigQuery export post scan action.
      • message
        Type: STRING
        Provider name: message
        Description: Output only. Additional information about the BigQuery exporting.
      • state
        Type: STRING
        Provider name: state
        Description: Output only. Execution state for the BigQuery exporting.
        Possible values:
        • STATE_UNSPECIFIED - The exporting state is unspecified.
        • SUCCEEDED - The exporting completed successfully.
        • FAILED - The exporting is no longer running due to an error.
        • SKIPPED - The exporting is skipped due to no valid scan result to export (usually caused by scan failed).
  • profile
    Type: STRUCT
    Provider name: profile
    Description: The profile information per field.
    • fields
      Type: UNORDERED_LIST_STRUCT
      Provider name: fields
      Description: List of fields with structural and profile information for each field.
      • mode
        Type: STRING
        Provider name: mode
        Description: The mode of the field. Possible values include: REQUIRED, if it is a required field. NULLABLE, if it is an optional field. REPEATED, if it is a repeated field.
      • name
        Type: STRING
        Provider name: name
        Description: The name of the field.
      • profile
        Type: STRUCT
        Provider name: profile
        Description: Profile information for the corresponding field.
        • distinct_ratio
          Type: DOUBLE
          Provider name: distinctRatio
          Description: Ratio of rows with distinct values against total scanned rows. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.
        • double_profile
          Type: STRUCT
          Provider name: doubleProfile
          Description: Double type field information.
          • average
            Type: DOUBLE
            Provider name: average
            Description: Average of non-null values in the scanned data. NaN, if the field has a NaN.
          • max
            Type: DOUBLE
            Provider name: max
            Description: Maximum of non-null values in the scanned data. NaN, if the field has a NaN.
          • min
            Type: DOUBLE
            Provider name: min
            Description: Minimum of non-null values in the scanned data. NaN, if the field has a NaN.
          • quartiles
            Type: UNORDERED_LIST_DOUBLE
            Provider name: quartiles
            Description: A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of quartile values for the scanned data, occurring in order Q1, median, Q3.
          • standard_deviation
            Type: DOUBLE
            Provider name: standardDeviation
            Description: Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.
        • integer_profile
          Type: STRUCT
          Provider name: integerProfile
          Description: Integer type field information.
          • average
            Type: DOUBLE
            Provider name: average
            Description: Average of non-null values in the scanned data. NaN, if the field has a NaN.
          • max
            Type: INT64
            Provider name: max
            Description: Maximum of non-null values in the scanned data. NaN, if the field has a NaN.
          • min
            Type: INT64
            Provider name: min
            Description: Minimum of non-null values in the scanned data. NaN, if the field has a NaN.
          • quartiles
            Type: UNORDERED_LIST_INT64
            Provider name: quartiles
            Description: A quartile divides the number of data points into four parts, or quarters, of more-or-less equal size. Three main quartiles used are: The first quartile (Q1) splits off the lowest 25% of data from the highest 75%. It is also known as the lower or 25th empirical quartile, as 25% of the data is below this point. The second quartile (Q2) is the median of a data set. So, 50% of the data lies below this point. The third quartile (Q3) splits off the highest 25% of data from the lowest 75%. It is known as the upper or 75th empirical quartile, as 75% of the data lies below this point. Here, the quartiles is provided as an ordered list of approximate quartile values for the scanned data, occurring in order Q1, median, Q3.
          • standard_deviation
            Type: DOUBLE
            Provider name: standardDeviation
            Description: Standard deviation of non-null values in the scanned data. NaN, if the field has a NaN.
        • null_ratio
          Type: DOUBLE
          Provider name: nullRatio
          Description: Ratio of rows with null value against total scanned rows.
        • string_profile
          Type: STRUCT
          Provider name: stringProfile
          Description: String type field information.
          • average_length
            Type: DOUBLE
            Provider name: averageLength
            Description: Average length of non-null values in the scanned data.
          • max_length
            Type: INT64
            Provider name: maxLength
            Description: Maximum length of non-null values in the scanned data.
          • min_length
            Type: INT64
            Provider name: minLength
            Description: Minimum length of non-null values in the scanned data.
        • top_n_values
          Type: UNORDERED_LIST_STRUCT
          Provider name: topNValues
          Description: The list of top N non-null values, frequency and ratio with which they occur in the scanned data. N is 10 or equal to the number of distinct values in the field, whichever is smaller. Not available for complex non-groupable field type, including RECORD, ARRAY, GEOGRAPHY, and JSON, as well as fields with REPEATABLE mode.
          • count
            Type: INT64
            Provider name: count
            Description: Count of the corresponding value in the scanned data.
          • ratio
            Type: DOUBLE
            Provider name: ratio
            Description: Ratio of the corresponding value in the field against the total number of rows in the scanned data.
          • value
            Type: STRING
            Provider name: value
            Description: String value of a top N non-null value.
      • type
        Type: STRING
        Provider name: type
        Description: The data type retrieved from the schema of the data source. For instance, for a BigQuery native table, it is the BigQuery Table Schema (https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#tablefieldschema). For a Dataplex Entity, it is the Entity Schema (https://cloud.google.com/dataplex/docs/reference/rpc/google.cloud.dataplex.v1#type_3).
  • row_count
    Type: INT64
    Provider name: rowCount
    Description: The count of rows scanned.
  • scanned_data
    Type: STRUCT
    Provider name: scannedData
    Description: The data scanned for this result.
    • incremental_field
      Type: STRUCT
      Provider name: incrementalField
      Description: The range denoted by values of an incremental field
      • end
        Type: STRING
        Provider name: end
        Description: Value that marks the end of the range.
      • field
        Type: STRING
        Provider name: field
        Description: The field that contains values which monotonically increases over time (e.g. a timestamp column).
      • start
        Type: STRING
        Provider name: start
        Description: Value that marks the start of the range.

data_profile_spec

Type: STRUCT
Provider name: dataProfileSpec
Description: Settings for a data profile scan.

  • exclude_fields
    Type: STRUCT
    Provider name: excludeFields
    Description: Optional. The fields to exclude from data profile.If specified, the fields will be excluded from data profile, regardless of include_fields value.
    • field_names
      Type: UNORDERED_LIST_STRING
      Provider name: fieldNames
      Description: Optional. Expected input is a list of fully qualified names of fields as in the schema.Only top-level field names for nested fields are supported. For instance, if ‘x’ is of nested field type, listing ‘x’ is supported but ‘x.y.z’ is not supported. Here ‘y’ and ‘y.z’ are nested fields of ‘x’.
  • include_fields
    Type: STRUCT
    Provider name: includeFields
    Description: Optional. The fields to include in data profile.If not specified, all fields at the time of profile scan job execution are included, except for ones listed in exclude_fields.
    • field_names
      Type: UNORDERED_LIST_STRING
      Provider name: fieldNames
      Description: Optional. Expected input is a list of fully qualified names of fields as in the schema.Only top-level field names for nested fields are supported. For instance, if ‘x’ is of nested field type, listing ‘x’ is supported but ‘x.y.z’ is not supported. Here ‘y’ and ‘y.z’ are nested fields of ‘x’.
  • post_scan_actions
    Type: STRUCT
    Provider name: postScanActions
    Description: Optional. Actions to take upon job completion..
    • bigquery_export
      Type: STRUCT
      Provider name: bigqueryExport
      Description: Optional. If set, results will be exported to the provided BigQuery table.
      • results_table
        Type: STRING
        Provider name: resultsTable
        Description: Optional. The BigQuery table to export DataProfileScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID or projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
  • row_filter
    Type: STRING
    Provider name: rowFilter
    Description: Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in GoogleSQL syntax (https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#where_clause).Example: col1 >= 0 AND col2 < 10
  • sampling_percent
    Type: FLOAT
    Provider name: samplingPercent
    Description: Optional. The percentage of the records to be selected from the dataset for DataScan. Value can range between 0.0 and 100.0 with up to 3 significant decimal digits. Sampling is not applied if sampling_percent is not specified, 0 or 100.

data_quality_result

Type: STRUCT
Provider name: dataQualityResult
Description: Output only. The result of a data quality scan.

  • columns
    Type: UNORDERED_LIST_STRUCT
    Provider name: columns
    Description: Output only. A list of results at the column level.A column will have a corresponding DataQualityColumnResult if and only if there is at least one rule with the ‘column’ field set to it.
    • column
      Type: STRING
      Provider name: column
      Description: Output only. The column specified in the DataQualityRule.
    • score
      Type: FLOAT
      Provider name: score
      Description: Output only. The column-level data quality score for this data scan job if and only if the ‘column’ field is set.The score ranges between between 0, 100 (up to two decimal points).
  • dimensions
    Type: UNORDERED_LIST_STRUCT
    Provider name: dimensions
    Description: Output only. A list of results at the dimension level.A dimension will have a corresponding DataQualityDimensionResult if and only if there is at least one rule with the ‘dimension’ field set to it.
    • dimension
      Type: STRUCT
      Provider name: dimension
      Description: Output only. The dimension config specified in the DataQualitySpec, as is.
      • name
        Type: STRING
        Provider name: name
        Description: Optional. The dimension name a rule belongs to. Custom dimension name is supported with all uppercase letters and maximum length of 30 characters.
    • passed
      Type: BOOLEAN
      Provider name: passed
      Description: Output only. Whether the dimension passed or failed.
    • score
      Type: FLOAT
      Provider name: score
      Description: Output only. The dimension-level data quality score for this data scan job if and only if the ‘dimension’ field is set.The score ranges between 0, 100 (up to two decimal points).
  • passed
    Type: BOOLEAN
    Provider name: passed
    Description: Output only. Overall data quality result – true if all rules passed.
  • post_scan_actions_result
    Type: STRUCT
    Provider name: postScanActionsResult
    Description: Output only. The result of post scan actions.
    • bigquery_export_result
      Type: STRUCT
      Provider name: bigqueryExportResult
      Description: Output only. The result of BigQuery export post scan action.
      • message
        Type: STRING
        Provider name: message
        Description: Output only. Additional information about the BigQuery exporting.
      • state
        Type: STRING
        Provider name: state
        Description: Output only. Execution state for the BigQuery exporting.
        Possible values:
        • STATE_UNSPECIFIED - The exporting state is unspecified.
        • SUCCEEDED - The exporting completed successfully.
        • FAILED - The exporting is no longer running due to an error.
        • SKIPPED - The exporting is skipped due to no valid scan result to export (usually caused by scan failed).
  • row_count
    Type: INT64
    Provider name: rowCount
    Description: Output only. The count of rows processed.
  • rules
    Type: UNORDERED_LIST_STRUCT
    Provider name: rules
    Description: Output only. A list of all the rules in a job, and their results.
    • assertion_row_count
      Type: INT64
      Provider name: assertionRowCount
      Description: Output only. The number of rows returned by the SQL statement in a SQL assertion rule.This field is only valid for SQL assertion rules.
    • evaluated_count
      Type: INT64
      Provider name: evaluatedCount
      Description: Output only. The number of rows a rule was evaluated against.This field is only valid for row-level type rules.Evaluated count can be configured to either include all rows (default) - with null rows automatically failing rule evaluation, or exclude null rows from the evaluated_count, by setting ignore_nulls = true.This field is not set for rule SqlAssertion.
    • failing_rows_query
      Type: STRING
      Provider name: failingRowsQuery
      Description: Output only. The query to find rows that did not pass this rule.This field is only valid for row-level type rules.
    • null_count
      Type: INT64
      Provider name: nullCount
      Description: Output only. The number of rows with null values in the specified column.
    • pass_ratio
      Type: DOUBLE
      Provider name: passRatio
      Description: Output only. The ratio of passed_count / evaluated_count.This field is only valid for row-level type rules.
    • passed
      Type: BOOLEAN
      Provider name: passed
      Description: Output only. Whether the rule passed or failed.
    • passed_count
      Type: INT64
      Provider name: passedCount
      Description: Output only. The number of rows which passed a rule evaluation.This field is only valid for row-level type rules.This field is not set for rule SqlAssertion.
    • rule
      Type: STRUCT
      Provider name: rule
      Description: Output only. The rule specified in the DataQualitySpec, as is.
      • column
        Type: STRING
        Provider name: column
        Description: Optional. The unnested column which this rule is evaluated against.

      • description
        Type: STRING
        Provider name: description
        Description: Optional. Description of the rule. The maximum length is 1,024 characters.

      • dimension
        Type: STRING
        Provider name: dimension
        Description: Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are “COMPLETENESS”, “ACCURACY”, “CONSISTENCY”, “VALIDITY”, “UNIQUENESS”, “FRESHNESS”, “VOLUME”

      • ignore_null
        Type: BOOLEAN
        Provider name: ignoreNull
        Description: Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.This field is only valid for the following type of rules: RangeExpectation RegexExpectation SetExpectation UniquenessExpectation

      • name
        Type: STRING
        Provider name: name
        Description: Optional. A mutable name for the rule. The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-). The maximum length is 63 characters. Must start with a letter. Must end with a number or a letter.

      • non_null_expectation
        Type: STRUCT
        Provider name: nonNullExpectation
        Description: Row-level rule which evaluates whether each column value is null.

      • range_expectation
        Type: STRUCT
        Provider name: rangeExpectation
        Description: Row-level rule which evaluates whether each column value lies between a specified range.

        • max_value
          Type: STRING
          Provider name: maxValue
          Description: Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
        • min_value
          Type: STRING
          Provider name: minValue
          Description: Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
        • strict_max_enabled
          Type: BOOLEAN
          Provider name: strictMaxEnabled
          Description: Optional. Whether each value needs to be strictly lesser than (’<’) the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
        • strict_min_enabled
          Type: BOOLEAN
          Provider name: strictMinEnabled
          Description: Optional. Whether each value needs to be strictly greater than (’>’) the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
      • regex_expectation
        Type: STRUCT
        Provider name: regexExpectation
        Description: Row-level rule which evaluates whether each column value matches a specified regex.

        • regex
          Type: STRING
          Provider name: regex
          Description: Optional. A regular expression the column value is expected to match.
      • row_condition_expectation
        Type: STRUCT
        Provider name: rowConditionExpectation
        Description: Row-level rule which evaluates whether each row in a table passes the specified condition.

        • sql_expression
          Type: STRING
          Provider name: sqlExpression
          Description: Optional. The SQL expression.
      • set_expectation
        Type: STRUCT
        Provider name: setExpectation
        Description: Row-level rule which evaluates whether each column value is contained by a specified set.

        • values
          Type: UNORDERED_LIST_STRING
          Provider name: values
          Description: Optional. Expected values for the column value.
      • sql_assertion
        Type: STRUCT
        Provider name: sqlAssertion
        Description: Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.

        • sql_statement
          Type: STRING
          Provider name: sqlStatement
          Description: Optional. The SQL statement.
      • statistic_range_expectation
        Type: STRUCT
        Provider name: statisticRangeExpectation
        Description: Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.

        • max_value
          Type: STRING
          Provider name: maxValue
          Description: Optional. The maximum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
        • min_value
          Type: STRING
          Provider name: minValue
          Description: Optional. The minimum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
        • statistic
          Type: STRING
          Provider name: statistic
          Description: Optional. The aggregate metric to evaluate.
          Possible values:
          • STATISTIC_UNDEFINED - Unspecified statistic type
          • MEAN - Evaluate the column mean
          • MIN - Evaluate the column min
          • MAX - Evaluate the column max
        • strict_max_enabled
          Type: BOOLEAN
          Provider name: strictMaxEnabled
          Description: Optional. Whether column statistic needs to be strictly lesser than (’<’) the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
        • strict_min_enabled
          Type: BOOLEAN
          Provider name: strictMinEnabled
          Description: Optional. Whether column statistic needs to be strictly greater than (’>’) the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
      • suspended
        Type: BOOLEAN
        Provider name: suspended
        Description: Optional. Whether the Rule is active or suspended. Default is false.

      • table_condition_expectation
        Type: STRUCT
        Provider name: tableConditionExpectation
        Description: Aggregate rule which evaluates whether the provided expression is true for a table.

        • sql_expression
          Type: STRING
          Provider name: sqlExpression
          Description: Optional. The SQL expression.
      • threshold
        Type: DOUBLE
        Provider name: threshold
        Description: Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of 0.0, 1.0.0 indicates default value (i.e. 1.0).This field is only valid for row-level type rules.

      • uniqueness_expectation
        Type: STRUCT
        Provider name: uniquenessExpectation
        Description: Row-level rule which evaluates whether each column value is unique.

  • scanned_data
    Type: STRUCT
    Provider name: scannedData
    Description: Output only. The data scanned for this result.
    • incremental_field
      Type: STRUCT
      Provider name: incrementalField
      Description: The range denoted by values of an incremental field
      • end
        Type: STRING
        Provider name: end
        Description: Value that marks the end of the range.
      • field
        Type: STRING
        Provider name: field
        Description: The field that contains values which monotonically increases over time (e.g. a timestamp column).
      • start
        Type: STRING
        Provider name: start
        Description: Value that marks the start of the range.
  • score
    Type: FLOAT
    Provider name: score
    Description: Output only. The overall data quality score.The score ranges between 0, 100 (up to two decimal points).

data_quality_spec

Type: STRUCT
Provider name: dataQualitySpec
Description: Settings for a data quality scan.

  • post_scan_actions
    Type: STRUCT
    Provider name: postScanActions
    Description: Optional. Actions to take upon job completion.
    • bigquery_export
      Type: STRUCT
      Provider name: bigqueryExport
      Description: Optional. If set, results will be exported to the provided BigQuery table.
      • results_table
        Type: STRING
        Provider name: resultsTable
        Description: Optional. The BigQuery table to export DataQualityScan results to. Format: //bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID or projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID
    • notification_report
      Type: STRUCT
      Provider name: notificationReport
      Description: Optional. If set, results will be sent to the provided notification receipts upon triggers.
      • job_end_trigger
        Type: STRUCT
        Provider name: jobEndTrigger
        Description: Optional. If set, report will be sent when a scan job ends.

      • job_failure_trigger
        Type: STRUCT
        Provider name: jobFailureTrigger
        Description: Optional. If set, report will be sent when a scan job fails.

      • recipients
        Type: STRUCT
        Provider name: recipients
        Description: Required. The recipients who will receive the notification report.

        • emails
          Type: UNORDERED_LIST_STRING
          Provider name: emails
          Description: Optional. The email recipients who will receive the DataQualityScan results report.
      • score_threshold_trigger
        Type: STRUCT
        Provider name: scoreThresholdTrigger
        Description: Optional. If set, report will be sent when score threshold is met.

        • score_threshold
          Type: FLOAT
          Provider name: scoreThreshold
          Description: Optional. The score range is in 0,100.
  • row_filter
    Type: STRING
    Provider name: rowFilter
    Description: Optional. A filter applied to all rows in a single DataScan job. The filter needs to be a valid SQL expression for a WHERE clause in GoogleSQL syntax (https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#where_clause).Example: col1 >= 0 AND col2 < 10
  • rules
    Type: UNORDERED_LIST_STRUCT
    Provider name: rules
    Description: Required. The list of rules to evaluate against a data source. At least one rule is required.
    • column
      Type: STRING
      Provider name: column
      Description: Optional. The unnested column which this rule is evaluated against.

    • description
      Type: STRING
      Provider name: description
      Description: Optional. Description of the rule. The maximum length is 1,024 characters.

    • dimension
      Type: STRING
      Provider name: dimension
      Description: Required. The dimension a rule belongs to. Results are also aggregated at the dimension level. Supported dimensions are “COMPLETENESS”, “ACCURACY”, “CONSISTENCY”, “VALIDITY”, “UNIQUENESS”, “FRESHNESS”, “VOLUME”

    • ignore_null
      Type: BOOLEAN
      Provider name: ignoreNull
      Description: Optional. Rows with null values will automatically fail a rule, unless ignore_null is true. In that case, such null rows are trivially considered passing.This field is only valid for the following type of rules: RangeExpectation RegexExpectation SetExpectation UniquenessExpectation

    • name
      Type: STRING
      Provider name: name
      Description: Optional. A mutable name for the rule. The name must contain only letters (a-z, A-Z), numbers (0-9), or hyphens (-). The maximum length is 63 characters. Must start with a letter. Must end with a number or a letter.

    • non_null_expectation
      Type: STRUCT
      Provider name: nonNullExpectation
      Description: Row-level rule which evaluates whether each column value is null.

    • range_expectation
      Type: STRUCT
      Provider name: rangeExpectation
      Description: Row-level rule which evaluates whether each column value lies between a specified range.

      • max_value
        Type: STRING
        Provider name: maxValue
        Description: Optional. The maximum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
      • min_value
        Type: STRING
        Provider name: minValue
        Description: Optional. The minimum column value allowed for a row to pass this validation. At least one of min_value and max_value need to be provided.
      • strict_max_enabled
        Type: BOOLEAN
        Provider name: strictMaxEnabled
        Description: Optional. Whether each value needs to be strictly lesser than (’<’) the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
      • strict_min_enabled
        Type: BOOLEAN
        Provider name: strictMinEnabled
        Description: Optional. Whether each value needs to be strictly greater than (’>’) the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
    • regex_expectation
      Type: STRUCT
      Provider name: regexExpectation
      Description: Row-level rule which evaluates whether each column value matches a specified regex.

      • regex
        Type: STRING
        Provider name: regex
        Description: Optional. A regular expression the column value is expected to match.
    • row_condition_expectation
      Type: STRUCT
      Provider name: rowConditionExpectation
      Description: Row-level rule which evaluates whether each row in a table passes the specified condition.

      • sql_expression
        Type: STRING
        Provider name: sqlExpression
        Description: Optional. The SQL expression.
    • set_expectation
      Type: STRUCT
      Provider name: setExpectation
      Description: Row-level rule which evaluates whether each column value is contained by a specified set.

      • values
        Type: UNORDERED_LIST_STRING
        Provider name: values
        Description: Optional. Expected values for the column value.
    • sql_assertion
      Type: STRUCT
      Provider name: sqlAssertion
      Description: Aggregate rule which evaluates the number of rows returned for the provided statement. If any rows are returned, this rule fails.

      • sql_statement
        Type: STRING
        Provider name: sqlStatement
        Description: Optional. The SQL statement.
    • statistic_range_expectation
      Type: STRUCT
      Provider name: statisticRangeExpectation
      Description: Aggregate rule which evaluates whether the column aggregate statistic lies between a specified range.

      • max_value
        Type: STRING
        Provider name: maxValue
        Description: Optional. The maximum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
      • min_value
        Type: STRING
        Provider name: minValue
        Description: Optional. The minimum column statistic value allowed for a row to pass this validation.At least one of min_value and max_value need to be provided.
      • statistic
        Type: STRING
        Provider name: statistic
        Description: Optional. The aggregate metric to evaluate.
        Possible values:
        • STATISTIC_UNDEFINED - Unspecified statistic type
        • MEAN - Evaluate the column mean
        • MIN - Evaluate the column min
        • MAX - Evaluate the column max
      • strict_max_enabled
        Type: BOOLEAN
        Provider name: strictMaxEnabled
        Description: Optional. Whether column statistic needs to be strictly lesser than (’<’) the maximum, or if equality is allowed.Only relevant if a max_value has been defined. Default = false.
      • strict_min_enabled
        Type: BOOLEAN
        Provider name: strictMinEnabled
        Description: Optional. Whether column statistic needs to be strictly greater than (’>’) the minimum, or if equality is allowed.Only relevant if a min_value has been defined. Default = false.
    • suspended
      Type: BOOLEAN
      Provider name: suspended
      Description: Optional. Whether the Rule is active or suspended. Default is false.

    • table_condition_expectation
      Type: STRUCT
      Provider name: tableConditionExpectation
      Description: Aggregate rule which evaluates whether the provided expression is true for a table.

      • sql_expression
        Type: STRING
        Provider name: sqlExpression
        Description: Optional. The SQL expression.
    • threshold
      Type: DOUBLE
      Provider name: threshold
      Description: Optional. The minimum ratio of passing_rows / total_rows required to pass this rule, with a range of 0.0, 1.0.0 indicates default value (i.e. 1.0).This field is only valid for row-level type rules.

    • uniqueness_expectation
      Type: STRUCT
      Provider name: uniquenessExpectation
      Description: Row-level rule which evaluates whether each column value is unique.

  • sampling_percent
    Type: FLOAT
    Provider name: samplingPercent
    Description: Optional. The percentage of the records to be selected from the dataset for DataScan. Value can range between 0.0 and 100.0 with up to 3 significant decimal digits. Sampling is not applied if sampling_percent is not specified, 0 or 100.

description

Type: STRING
Provider name: description
Description: Optional. Description of the scan. Must be between 1-1024 characters.

execution_spec

Type: STRUCT
Provider name: executionSpec
Description: Optional. DataScan execution settings.If not specified, the fields in it will use their default values.

  • field
    Type: STRING
    Provider name: field
    Description: Immutable. The unnested field (of type Date or Timestamp) that contains values which monotonically increase over time.If not specified, a data scan will run for all data in the table.
  • trigger
    Type: STRUCT
    Provider name: trigger
    Description: Optional. Spec related to how often and when a scan should be triggered.If not specified, the default is OnDemand, which means the scan will not run until the user calls RunDataScan API.
    • on_demand
      Type: STRUCT
      Provider name: onDemand
      Description: The scan runs once via RunDataScan API.

    • schedule
      Type: STRUCT
      Provider name: schedule
      Description: The scan is scheduled to run periodically.

      • cron
        Type: STRING
        Provider name: cron
        Description: Required. Cron (https://en.wikipedia.org/wiki/Cron) schedule for running scans periodically.To explicitly set a timezone in the cron tab, apply a prefix in the cron tab: “CRON_TZ=${IANA_TIME_ZONE}” or “TZ=${IANA_TIME_ZONE}”. The ${IANA_TIME_ZONE} may only be a valid string from IANA time zone database (wikipedia (https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List)). For example, CRON_TZ=America/New_York 1 * * * *, or TZ=America/New_York 1 * * * *.This field is required for Schedule scans.

execution_status

Type: STRUCT
Provider name: executionStatus
Description: Output only. Status of the data scan execution.

  • latest_job_create_time
    Type: TIMESTAMP
    Provider name: latestJobCreateTime
    Description: Optional. The time when the DataScanJob execution was created.
  • latest_job_end_time
    Type: TIMESTAMP
    Provider name: latestJobEndTime
    Description: Optional. The time when the latest DataScanJob ended.
  • latest_job_start_time
    Type: TIMESTAMP
    Provider name: latestJobStartTime
    Description: Optional. The time when the latest DataScanJob started.

gcp_display_name

Type: STRING
Provider name: displayName
Description: Optional. User friendly display name. Must be between 1-256 characters.

labels

Type: UNORDERED_LIST_STRING

name

Type: STRING
Provider name: name
Description: Output only. Identifier. The relative resource name of the scan, of the form: projects/{project}/locations/{location_id}/dataScans/{datascan_id}, where project refers to a project_id or project_number and location_id refers to a GCP region.

organization_id

Type: STRING

parent

Type: STRING

project_id

Type: STRING

project_number

Type: STRING

resource_name

Type: STRING

state

Type: STRING
Provider name: state
Description: Output only. Current state of the DataScan.
Possible values:

  • STATE_UNSPECIFIED - State is not specified.
  • ACTIVE - Resource is active, i.e., ready to use.
  • CREATING - Resource is under creation.
  • DELETING - Resource is under deletion.
  • ACTION_REQUIRED - Resource is active but has unresolved actions.

tags

Type: UNORDERED_LIST_STRING

type

Type: STRING
Provider name: type
Description: Output only. The type of DataScan.
Possible values:

  • DATA_SCAN_TYPE_UNSPECIFIED - The data scan type is unspecified.
  • DATA_QUALITY - Data quality scan.
  • DATA_PROFILE - Data profile scan.
  • DATA_DISCOVERY - Data discovery scan.

uid

Type: STRING
Provider name: uid
Description: Output only. System generated globally unique ID for the scan. This ID will be different if the scan is deleted and re-created with the same name.

update_time

Type: TIMESTAMP
Provider name: updateTime
Description: Output only. The time when the scan was last updated.

PREVIEWING: guacbot/translation-pipeline