gcp_dataproc_job

ancestors

Type: UNORDERED_LIST_STRING

done

Type: BOOLEAN
Provider name: done
Description: Output only. Indicates whether the job is completed. If the value is false, the job is still in progress. If true, the job is completed, and status.state field will indicate if it was successful, failed, or cancelled.

driver_control_files_uri

Type: STRING
Provider name: driverControlFilesUri
Description: Output only. If present, the location of miscellaneous control files which may be used as part of job setup and handling. If not present, control files may be placed in the same location as driver_output_uri.

driver_output_resource_uri

Type: STRING
Provider name: driverOutputResourceUri
Description: Output only. A URI pointing to the location of the stdout of the job’s driver program.

driver_scheduling_config

Type: STRUCT
Provider name: driverSchedulingConfig
Description: Optional. Driver scheduling configuration.

  • memory_mb
    Type: INT32
    Provider name: memoryMb
    Description: Required. The amount of memory in MB the driver is requesting.
  • vcores
    Type: INT32
    Provider name: vcores
    Description: Required. The number of vCPUs the driver is requesting.

gcp_status

Type: STRUCT
Provider name: status
Description: Output only. The job status. Additional application-specific status information may be contained in the type_job and yarn_applications fields.

  • details
    Type: STRING
    Provider name: details
    Description: Optional. Output only. Job state details, such as an error description if the state is ERROR.
  • state
    Type: STRING
    Provider name: state
    Description: Output only. A state message specifying the overall job state.
    Possible values:
    • STATE_UNSPECIFIED - The job state is unknown.
    • PENDING - The job is pending; it has been submitted, but is not yet running.
    • SETUP_DONE - Job has been received by the service and completed initial setup; it will soon be submitted to the cluster.
    • RUNNING - The job is running on the cluster.
    • CANCEL_PENDING - A CancelJob request has been received, but is pending.
    • CANCEL_STARTED - Transient in-flight resources have been canceled, and the request to cancel the running job has been issued to the cluster.
    • CANCELLED - The job cancellation was successful.
    • DONE - The job has completed successfully.
    • ERROR - The job has completed, but encountered an error.
    • ATTEMPT_FAILURE - Job attempt has failed. The detail field contains failure details for this attempt.Applies to restartable jobs only.
  • state_start_time
    Type: TIMESTAMP
    Provider name: stateStartTime
    Description: Output only. The time when this state was entered.
  • substate
    Type: STRING
    Provider name: substate
    Description: Output only. Additional state information, which includes status reported by the agent.
    Possible values:
    • UNSPECIFIED - The job substate is unknown.
    • SUBMITTED - The Job is submitted to the agent.Applies to RUNNING state.
    • QUEUED - The Job has been received and is awaiting execution (it may be waiting for a condition to be met). See the ‘details’ field for the reason for the delay.Applies to RUNNING state.
    • STALE_STATUS - The agent-reported status is out of date, which may be caused by a loss of communication between the agent and Dataproc. If the agent does not send a timely update, the job will fail.Applies to RUNNING state.

hadoop_job

Type: STRUCT
Provider name: hadoopJob
Description: Optional. Job is a Hadoop job.

  • archive_uris
    Type: UNORDERED_LIST_STRING
    Provider name: archiveUris
    Description: Optional. HCFS URIs of archives to be extracted in the working directory of Hadoop drivers and tasks. Supported file types: .jar, .tar, .tar.gz, .tgz, or .zip.

  • args
    Type: UNORDERED_LIST_STRING
    Provider name: args
    Description: Optional. The arguments to pass to the driver. Do not include arguments, such as -libjars or -Dfoo=bar, that can be set as job properties, since a collision may occur that causes an incorrect job submission.

  • file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: fileUris
    Description: Optional. HCFS (Hadoop Compatible Filesystem) URIs of files to be copied to the working directory of Hadoop drivers and distributed tasks. Useful for naively parallel tasks.

  • jar_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: jarFileUris
    Description: Optional. Jar file URIs to add to the CLASSPATHs of the Hadoop driver and tasks.

  • logging_config
    Type: STRUCT
    Provider name: loggingConfig
    Description: Optional. The runtime log config for job execution.

  • main_class
    Type: STRING
    Provider name: mainClass
    Description: The name of the driver’s main class. The jar file containing the class must be in the default CLASSPATH or specified in jar_file_uris.

  • main_jar_file_uri
    Type: STRING
    Provider name: mainJarFileUri
    Description: The HCFS URI of the jar file containing the main class. Examples: ‘gs://foo-bucket/analytics-binaries/extract-useful-metrics-mr.jar’ ‘hdfs:/tmp/test-samples/custom-wordcount.jar’ ‘file:///home/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar’

hive_job

Type: STRUCT
Provider name: hiveJob
Description: Optional. Job is a Hive job.

  • continue_on_failure
    Type: BOOLEAN
    Provider name: continueOnFailure
    Description: Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.
  • jar_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: jarFileUris
    Description: Optional. HCFS URIs of jar files to add to the CLASSPATH of the Hive server and Hadoop MapReduce (MR) tasks. Can contain Hive SerDes and UDFs.
  • query_file_uri
    Type: STRING
    Provider name: queryFileUri
    Description: The HCFS URI of the script that contains Hive queries.
  • query_list
    Type: STRUCT
    Provider name: queryList
    Description: A list of queries.
    • queries
      Type: UNORDERED_LIST_STRING
      Provider name: queries
      Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }

job_uuid

Type: STRING
Provider name: jobUuid
Description: Output only. A UUID that uniquely identifies a job within the project over time. This is in contrast to a user-settable reference.job_id that may be reused over time.

labels

Type: UNORDERED_LIST_STRING

organization_id

Type: STRING

parent

Type: STRING

pig_job

Type: STRUCT
Provider name: pigJob
Description: Optional. Job is a Pig job.

  • continue_on_failure
    Type: BOOLEAN
    Provider name: continueOnFailure
    Description: Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.

  • jar_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: jarFileUris
    Description: Optional. HCFS URIs of jar files to add to the CLASSPATH of the Pig Client and Hadoop MapReduce (MR) tasks. Can contain Pig UDFs.

  • logging_config
    Type: STRUCT
    Provider name: loggingConfig
    Description: Optional. The runtime log config for job execution.

  • query_file_uri
    Type: STRING
    Provider name: queryFileUri
    Description: The HCFS URI of the script that contains the Pig queries.

  • query_list
    Type: STRUCT
    Provider name: queryList
    Description: A list of queries.

    • queries
      Type: UNORDERED_LIST_STRING
      Provider name: queries
      Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }

placement

Type: STRUCT
Provider name: placement
Description: Required. Job information, including how, when, and where to run the job.

  • cluster_name
    Type: STRING
    Provider name: clusterName
    Description: Required. The name of the cluster where the job will be submitted.
  • cluster_uuid
    Type: STRING
    Provider name: clusterUuid
    Description: Output only. A cluster UUID generated by the Dataproc service when the job is submitted.

presto_job

Type: STRUCT
Provider name: prestoJob
Description: Optional. Job is a Presto job.

  • client_tags
    Type: UNORDERED_LIST_STRING
    Provider name: clientTags
    Description: Optional. Presto client tags to attach to this query

  • continue_on_failure
    Type: BOOLEAN
    Provider name: continueOnFailure
    Description: Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.

  • logging_config
    Type: STRUCT
    Provider name: loggingConfig
    Description: Optional. The runtime log config for job execution.

  • output_format
    Type: STRING
    Provider name: outputFormat
    Description: Optional. The format in which query output will be displayed. See the Presto documentation for supported output formats

  • query_file_uri
    Type: STRING
    Provider name: queryFileUri
    Description: The HCFS URI of the script that contains SQL queries.

  • query_list
    Type: STRUCT
    Provider name: queryList
    Description: A list of queries.

    • queries
      Type: UNORDERED_LIST_STRING
      Provider name: queries
      Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }

project_id

Type: STRING

project_number

Type: STRING

pyspark_job

Type: STRUCT
Provider name: pysparkJob
Description: Optional. Job is a PySpark job.

  • archive_uris
    Type: UNORDERED_LIST_STRING
    Provider name: archiveUris
    Description: Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

  • args
    Type: UNORDERED_LIST_STRING
    Provider name: args
    Description: Optional. The arguments to pass to the driver. Do not include arguments, such as –conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.

  • file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: fileUris
    Description: Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.

  • jar_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: jarFileUris
    Description: Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Python driver and tasks.

  • logging_config
    Type: STRUCT
    Provider name: loggingConfig
    Description: Optional. The runtime log config for job execution.

  • main_python_file_uri
    Type: STRING
    Provider name: mainPythonFileUri
    Description: Required. The HCFS URI of the main Python file to use as the driver. Must be a .py file.

  • python_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: pythonFileUris
    Description: Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.

reference

Type: STRUCT
Provider name: reference
Description: Optional. The fully qualified reference to the job, which can be used to obtain the equivalent REST path of the job resource. If this property is not specified when a job is created, the server generates a job_id.

  • job_id
    Type: STRING
    Provider name: jobId
    Description: Optional. The job ID, which must be unique within the project.The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or hyphens (-). The maximum length is 100 characters.If not specified by the caller, the job ID will be provided by the server.
  • project_id
    Type: STRING
    Provider name: projectId
    Description: Optional. The ID of the Google Cloud Platform project that the job belongs to. If specified, must match the request project ID.

resource_name

Type: STRING

scheduling

Type: STRUCT
Provider name: scheduling
Description: Optional. Job scheduling configuration.

  • max_failures_per_hour
    Type: INT32
    Provider name: maxFailuresPerHour
    Description: Optional. Maximum number of times per hour a driver may be restarted as a result of driver exiting with non-zero code before job is reported failed.A job may be reported as thrashing if the driver exits with a non-zero code four times within a 10-minute window.Maximum value is 10.Note: This restartable job option is not supported in Dataproc workflow templates (https://cloud.google.com/dataproc/docs/concepts/workflows/using-workflows#adding_jobs_to_a_template).
  • max_failures_total
    Type: INT32
    Provider name: maxFailuresTotal
    Description: Optional. Maximum total number of times a driver may be restarted as a result of the driver exiting with a non-zero code. After the maximum number is reached, the job will be reported as failed.Maximum value is 240.Note: Currently, this restartable job option is not supported in Dataproc workflow templates (https://cloud.google.com/dataproc/docs/concepts/workflows/using-workflows#adding_jobs_to_a_template).

spark_job

Type: STRUCT
Provider name: sparkJob
Description: Optional. Job is a Spark job.

  • archive_uris
    Type: UNORDERED_LIST_STRING
    Provider name: archiveUris
    Description: Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

  • args
    Type: UNORDERED_LIST_STRING
    Provider name: args
    Description: Optional. The arguments to pass to the driver. Do not include arguments, such as –conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.

  • file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: fileUris
    Description: Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.

  • jar_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: jarFileUris
    Description: Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Spark driver and tasks.

  • logging_config
    Type: STRUCT
    Provider name: loggingConfig
    Description: Optional. The runtime log config for job execution.

  • main_class
    Type: STRING
    Provider name: mainClass
    Description: The name of the driver’s main class. The jar file that contains the class must be in the default CLASSPATH or specified in jar_file_uris.

  • main_jar_file_uri
    Type: STRING
    Provider name: mainJarFileUri
    Description: The HCFS URI of the jar file that contains the main class.

spark_r_job

Type: STRUCT
Provider name: sparkRJob
Description: Optional. Job is a SparkR job.

  • archive_uris
    Type: UNORDERED_LIST_STRING
    Provider name: archiveUris
    Description: Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.

  • args
    Type: UNORDERED_LIST_STRING
    Provider name: args
    Description: Optional. The arguments to pass to the driver. Do not include arguments, such as –conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.

  • file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: fileUris
    Description: Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.

  • logging_config
    Type: STRUCT
    Provider name: loggingConfig
    Description: Optional. The runtime log config for job execution.

  • main_r_file_uri
    Type: STRING
    Provider name: mainRFileUri
    Description: Required. The HCFS URI of the main R file to use as the driver. Must be a .R file.

spark_sql_job

Type: STRUCT
Provider name: sparkSqlJob
Description: Optional. Job is a SparkSql job.

  • jar_file_uris
    Type: UNORDERED_LIST_STRING
    Provider name: jarFileUris
    Description: Optional. HCFS URIs of jar files to be added to the Spark CLASSPATH.

  • logging_config
    Type: STRUCT
    Provider name: loggingConfig
    Description: Optional. The runtime log config for job execution.

  • query_file_uri
    Type: STRING
    Provider name: queryFileUri
    Description: The HCFS URI of the script that contains SQL queries.

  • query_list
    Type: STRUCT
    Provider name: queryList
    Description: A list of queries.

    • queries
      Type: UNORDERED_LIST_STRING
      Provider name: queries
      Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }

status_history

Type: UNORDERED_LIST_STRUCT
Provider name: statusHistory
Description: Output only. The previous job status.

  • details
    Type: STRING
    Provider name: details
    Description: Optional. Output only. Job state details, such as an error description if the state is ERROR.
  • state
    Type: STRING
    Provider name: state
    Description: Output only. A state message specifying the overall job state.
    Possible values:
    • STATE_UNSPECIFIED - The job state is unknown.
    • PENDING - The job is pending; it has been submitted, but is not yet running.
    • SETUP_DONE - Job has been received by the service and completed initial setup; it will soon be submitted to the cluster.
    • RUNNING - The job is running on the cluster.
    • CANCEL_PENDING - A CancelJob request has been received, but is pending.
    • CANCEL_STARTED - Transient in-flight resources have been canceled, and the request to cancel the running job has been issued to the cluster.
    • CANCELLED - The job cancellation was successful.
    • DONE - The job has completed successfully.
    • ERROR - The job has completed, but encountered an error.
    • ATTEMPT_FAILURE - Job attempt has failed. The detail field contains failure details for this attempt.Applies to restartable jobs only.
  • state_start_time
    Type: TIMESTAMP
    Provider name: stateStartTime
    Description: Output only. The time when this state was entered.
  • substate
    Type: STRING
    Provider name: substate
    Description: Output only. Additional state information, which includes status reported by the agent.
    Possible values:
    • UNSPECIFIED - The job substate is unknown.
    • SUBMITTED - The Job is submitted to the agent.Applies to RUNNING state.
    • QUEUED - The Job has been received and is awaiting execution (it may be waiting for a condition to be met). See the ‘details’ field for the reason for the delay.Applies to RUNNING state.
    • STALE_STATUS - The agent-reported status is out of date, which may be caused by a loss of communication between the agent and Dataproc. If the agent does not send a timely update, the job will fail.Applies to RUNNING state.

tags

Type: UNORDERED_LIST_STRING

trino_job

Type: STRUCT
Provider name: trinoJob
Description: Optional. Job is a Trino job.

  • client_tags
    Type: UNORDERED_LIST_STRING
    Provider name: clientTags
    Description: Optional. Trino client tags to attach to this query

  • continue_on_failure
    Type: BOOLEAN
    Provider name: continueOnFailure
    Description: Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.

  • logging_config
    Type: STRUCT
    Provider name: loggingConfig
    Description: Optional. The runtime log config for job execution.

  • output_format
    Type: STRING
    Provider name: outputFormat
    Description: Optional. The format in which query output will be displayed. See the Trino documentation for supported output formats

  • query_file_uri
    Type: STRING
    Provider name: queryFileUri
    Description: The HCFS URI of the script that contains SQL queries.

  • query_list
    Type: STRUCT
    Provider name: queryList
    Description: A list of queries.

    • queries
      Type: UNORDERED_LIST_STRING
      Provider name: queries
      Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }

yarn_applications

Type: UNORDERED_LIST_STRUCT
Provider name: yarnApplications
Description: Output only. The collection of YARN applications spun up by this job.Beta Feature: This report is available for testing purposes only. It may be changed before final release.

  • name
    Type: STRING
    Provider name: name
    Description: Required. The application name.
  • progress
    Type: FLOAT
    Provider name: progress
    Description: Required. The numerical progress of the application, from 1 to 100.
  • state
    Type: STRING
    Provider name: state
    Description: Required. The application state.
    Possible values:
    • STATE_UNSPECIFIED - Status is unspecified.
    • NEW - Status is NEW.
    • NEW_SAVING - Status is NEW_SAVING.
    • SUBMITTED - Status is SUBMITTED.
    • ACCEPTED - Status is ACCEPTED.
    • RUNNING - Status is RUNNING.
    • FINISHED - Status is FINISHED.
    • FAILED - Status is FAILED.
    • KILLED - Status is KILLED.
  • tracking_url
    Type: STRING
    Provider name: trackingUrl
    Description: Optional. The HTTP URL of the ApplicationMaster, HistoryServer, or TimelineServer that provides application-specific information. The URL uses the internal hostname, and requires a proxy server for resolution and, possibly, access.
PREVIEWING: guacbot/translation-pipeline