gcp_dataproc_job
ancestors
Type: UNORDERED_LIST_STRING
done
Type: BOOLEAN
Provider name: done
Description: Output only. Indicates whether the job is completed. If the value is false, the job is still in progress. If true, the job is completed, and status.state field will indicate if it was successful, failed, or cancelled.
driver_control_files_uri
Type: STRING
Provider name: driverControlFilesUri
Description: Output only. If present, the location of miscellaneous control files which may be used as part of job setup and handling. If not present, control files may be placed in the same location as driver_output_uri.
driver_output_resource_uri
Type: STRING
Provider name: driverOutputResourceUri
Description: Output only. A URI pointing to the location of the stdout of the job’s driver program.
driver_scheduling_config
Type: STRUCT
Provider name: driverSchedulingConfig
Description: Optional. Driver scheduling configuration.
memory_mb
Type: INT32
Provider name: memoryMb
Description: Required. The amount of memory in MB the driver is requesting.
vcores
Type: INT32
Provider name: vcores
Description: Required. The number of vCPUs the driver is requesting.
gcp_status
Type: STRUCT
Provider name: status
Description: Output only. The job status. Additional application-specific status information may be contained in the type_job and yarn_applications fields.
details
Type: STRING
Provider name: details
Description: Optional. Output only. Job state details, such as an error description if the state is ERROR.
state
Type: STRING
Provider name: state
Description: Output only. A state message specifying the overall job state.
Possible values:
STATE_UNSPECIFIED
- The job state is unknown.
PENDING
- The job is pending; it has been submitted, but is not yet running.
SETUP_DONE
- Job has been received by the service and completed initial setup; it will soon be submitted to the cluster.
RUNNING
- The job is running on the cluster.
CANCEL_PENDING
- A CancelJob request has been received, but is pending.
CANCEL_STARTED
- Transient in-flight resources have been canceled, and the request to cancel the running job has been issued to the cluster.
CANCELLED
- The job cancellation was successful.
DONE
- The job has completed successfully.
ERROR
- The job has completed, but encountered an error.
ATTEMPT_FAILURE
- Job attempt has failed. The detail field contains failure details for this attempt.Applies to restartable jobs only.
state_start_time
Type: TIMESTAMP
Provider name: stateStartTime
Description: Output only. The time when this state was entered.
substate
Type: STRING
Provider name: substate
Description: Output only. Additional state information, which includes status reported by the agent.
Possible values:
UNSPECIFIED
- The job substate is unknown.
SUBMITTED
- The Job is submitted to the agent.Applies to RUNNING state.
QUEUED
- The Job has been received and is awaiting execution (it may be waiting for a condition to be met). See the ‘details’ field for the reason for the delay.Applies to RUNNING state.
STALE_STATUS
- The agent-reported status is out of date, which may be caused by a loss of communication between the agent and Dataproc. If the agent does not send a timely update, the job will fail.Applies to RUNNING state.
hadoop_job
Type: STRUCT
Provider name: hadoopJob
Description: Optional. Job is a Hadoop job.
archive_uris
Type: UNORDERED_LIST_STRING
Provider name: archiveUris
Description: Optional. HCFS URIs of archives to be extracted in the working directory of Hadoop drivers and tasks. Supported file types: .jar, .tar, .tar.gz, .tgz, or .zip.
args
Type: UNORDERED_LIST_STRING
Provider name: args
Description: Optional. The arguments to pass to the driver. Do not include arguments, such as -libjars or -Dfoo=bar, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
file_uris
Type: UNORDERED_LIST_STRING
Provider name: fileUris
Description: Optional. HCFS (Hadoop Compatible Filesystem) URIs of files to be copied to the working directory of Hadoop drivers and distributed tasks. Useful for naively parallel tasks.
jar_file_uris
Type: UNORDERED_LIST_STRING
Provider name: jarFileUris
Description: Optional. Jar file URIs to add to the CLASSPATHs of the Hadoop driver and tasks.
logging_config
Type: STRUCT
Provider name: loggingConfig
Description: Optional. The runtime log config for job execution.
main_class
Type: STRING
Provider name: mainClass
Description: The name of the driver’s main class. The jar file containing the class must be in the default CLASSPATH or specified in jar_file_uris.
main_jar_file_uri
Type: STRING
Provider name: mainJarFileUri
Description: The HCFS URI of the jar file containing the main class. Examples: ‘gs://foo-bucket/analytics-binaries/extract-useful-metrics-mr.jar’ ‘hdfs:/tmp/test-samples/custom-wordcount.jar’ ‘file:///home/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar’
hive_job
Type: STRUCT
Provider name: hiveJob
Description: Optional. Job is a Hive job.
continue_on_failure
Type: BOOLEAN
Provider name: continueOnFailure
Description: Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.
jar_file_uris
Type: UNORDERED_LIST_STRING
Provider name: jarFileUris
Description: Optional. HCFS URIs of jar files to add to the CLASSPATH of the Hive server and Hadoop MapReduce (MR) tasks. Can contain Hive SerDes and UDFs.
query_file_uri
Type: STRING
Provider name: queryFileUri
Description: The HCFS URI of the script that contains Hive queries.
query_list
Type: STRUCT
Provider name: queryList
Description: A list of queries.
queries
Type: UNORDERED_LIST_STRING
Provider name: queries
Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }
job_uuid
Type: STRING
Provider name: jobUuid
Description: Output only. A UUID that uniquely identifies a job within the project over time. This is in contrast to a user-settable reference.job_id that may be reused over time.
labels
Type: UNORDERED_LIST_STRING
organization_id
Type: STRING
parent
Type: STRING
pig_job
Type: STRUCT
Provider name: pigJob
Description: Optional. Job is a Pig job.
continue_on_failure
Type: BOOLEAN
Provider name: continueOnFailure
Description: Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.
jar_file_uris
Type: UNORDERED_LIST_STRING
Provider name: jarFileUris
Description: Optional. HCFS URIs of jar files to add to the CLASSPATH of the Pig Client and Hadoop MapReduce (MR) tasks. Can contain Pig UDFs.
logging_config
Type: STRUCT
Provider name: loggingConfig
Description: Optional. The runtime log config for job execution.
query_file_uri
Type: STRING
Provider name: queryFileUri
Description: The HCFS URI of the script that contains the Pig queries.
query_list
Type: STRUCT
Provider name: queryList
Description: A list of queries.
queries
Type: UNORDERED_LIST_STRING
Provider name: queries
Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }
placement
Type: STRUCT
Provider name: placement
Description: Required. Job information, including how, when, and where to run the job.
cluster_name
Type: STRING
Provider name: clusterName
Description: Required. The name of the cluster where the job will be submitted.
cluster_uuid
Type: STRING
Provider name: clusterUuid
Description: Output only. A cluster UUID generated by the Dataproc service when the job is submitted.
presto_job
Type: STRUCT
Provider name: prestoJob
Description: Optional. Job is a Presto job.
client_tags
Type: UNORDERED_LIST_STRING
Provider name: clientTags
Description: Optional. Presto client tags to attach to this query
continue_on_failure
Type: BOOLEAN
Provider name: continueOnFailure
Description: Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.
logging_config
Type: STRUCT
Provider name: loggingConfig
Description: Optional. The runtime log config for job execution.
output_format
Type: STRING
Provider name: outputFormat
Description: Optional. The format in which query output will be displayed. See the Presto documentation for supported output formats
query_file_uri
Type: STRING
Provider name: queryFileUri
Description: The HCFS URI of the script that contains SQL queries.
query_list
Type: STRUCT
Provider name: queryList
Description: A list of queries.
queries
Type: UNORDERED_LIST_STRING
Provider name: queries
Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }
project_id
Type: STRING
project_number
Type: STRING
pyspark_job
Type: STRUCT
Provider name: pysparkJob
Description: Optional. Job is a PySpark job.
archive_uris
Type: UNORDERED_LIST_STRING
Provider name: archiveUris
Description: Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
args
Type: UNORDERED_LIST_STRING
Provider name: args
Description: Optional. The arguments to pass to the driver. Do not include arguments, such as –conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
file_uris
Type: UNORDERED_LIST_STRING
Provider name: fileUris
Description: Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
jar_file_uris
Type: UNORDERED_LIST_STRING
Provider name: jarFileUris
Description: Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Python driver and tasks.
logging_config
Type: STRUCT
Provider name: loggingConfig
Description: Optional. The runtime log config for job execution.
main_python_file_uri
Type: STRING
Provider name: mainPythonFileUri
Description: Required. The HCFS URI of the main Python file to use as the driver. Must be a .py file.
python_file_uris
Type: UNORDERED_LIST_STRING
Provider name: pythonFileUris
Description: Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.
reference
Type: STRUCT
Provider name: reference
Description: Optional. The fully qualified reference to the job, which can be used to obtain the equivalent REST path of the job resource. If this property is not specified when a job is created, the server generates a job_id.
job_id
Type: STRING
Provider name: jobId
Description: Optional. The job ID, which must be unique within the project.The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or hyphens (-). The maximum length is 100 characters.If not specified by the caller, the job ID will be provided by the server.
project_id
Type: STRING
Provider name: projectId
Description: Optional. The ID of the Google Cloud Platform project that the job belongs to. If specified, must match the request project ID.
resource_name
Type: STRING
scheduling
Type: STRUCT
Provider name: scheduling
Description: Optional. Job scheduling configuration.
max_failures_per_hour
Type: INT32
Provider name: maxFailuresPerHour
Description: Optional. Maximum number of times per hour a driver may be restarted as a result of driver exiting with non-zero code before job is reported failed.A job may be reported as thrashing if the driver exits with a non-zero code four times within a 10-minute window.Maximum value is 10.Note: This restartable job option is not supported in Dataproc workflow templates (https://cloud.google.com/dataproc/docs/concepts/workflows/using-workflows#adding_jobs_to_a_template).
max_failures_total
Type: INT32
Provider name: maxFailuresTotal
Description: Optional. Maximum total number of times a driver may be restarted as a result of the driver exiting with a non-zero code. After the maximum number is reached, the job will be reported as failed.Maximum value is 240.Note: Currently, this restartable job option is not supported in Dataproc workflow templates (https://cloud.google.com/dataproc/docs/concepts/workflows/using-workflows#adding_jobs_to_a_template).
spark_job
Type: STRUCT
Provider name: sparkJob
Description: Optional. Job is a Spark job.
archive_uris
Type: UNORDERED_LIST_STRING
Provider name: archiveUris
Description: Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
args
Type: UNORDERED_LIST_STRING
Provider name: args
Description: Optional. The arguments to pass to the driver. Do not include arguments, such as –conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
file_uris
Type: UNORDERED_LIST_STRING
Provider name: fileUris
Description: Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
jar_file_uris
Type: UNORDERED_LIST_STRING
Provider name: jarFileUris
Description: Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Spark driver and tasks.
logging_config
Type: STRUCT
Provider name: loggingConfig
Description: Optional. The runtime log config for job execution.
main_class
Type: STRING
Provider name: mainClass
Description: The name of the driver’s main class. The jar file that contains the class must be in the default CLASSPATH or specified in jar_file_uris.
main_jar_file_uri
Type: STRING
Provider name: mainJarFileUri
Description: The HCFS URI of the jar file that contains the main class.
spark_r_job
Type: STRUCT
Provider name: sparkRJob
Description: Optional. Job is a SparkR job.
archive_uris
Type: UNORDERED_LIST_STRING
Provider name: archiveUris
Description: Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
args
Type: UNORDERED_LIST_STRING
Provider name: args
Description: Optional. The arguments to pass to the driver. Do not include arguments, such as –conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
file_uris
Type: UNORDERED_LIST_STRING
Provider name: fileUris
Description: Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
logging_config
Type: STRUCT
Provider name: loggingConfig
Description: Optional. The runtime log config for job execution.
main_r_file_uri
Type: STRING
Provider name: mainRFileUri
Description: Required. The HCFS URI of the main R file to use as the driver. Must be a .R file.
spark_sql_job
Type: STRUCT
Provider name: sparkSqlJob
Description: Optional. Job is a SparkSql job.
jar_file_uris
Type: UNORDERED_LIST_STRING
Provider name: jarFileUris
Description: Optional. HCFS URIs of jar files to be added to the Spark CLASSPATH.
logging_config
Type: STRUCT
Provider name: loggingConfig
Description: Optional. The runtime log config for job execution.
query_file_uri
Type: STRING
Provider name: queryFileUri
Description: The HCFS URI of the script that contains SQL queries.
query_list
Type: STRUCT
Provider name: queryList
Description: A list of queries.
queries
Type: UNORDERED_LIST_STRING
Provider name: queries
Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }
status_history
Type: UNORDERED_LIST_STRUCT
Provider name: statusHistory
Description: Output only. The previous job status.
details
Type: STRING
Provider name: details
Description: Optional. Output only. Job state details, such as an error description if the state is ERROR.
state
Type: STRING
Provider name: state
Description: Output only. A state message specifying the overall job state.
Possible values:
STATE_UNSPECIFIED
- The job state is unknown.
PENDING
- The job is pending; it has been submitted, but is not yet running.
SETUP_DONE
- Job has been received by the service and completed initial setup; it will soon be submitted to the cluster.
RUNNING
- The job is running on the cluster.
CANCEL_PENDING
- A CancelJob request has been received, but is pending.
CANCEL_STARTED
- Transient in-flight resources have been canceled, and the request to cancel the running job has been issued to the cluster.
CANCELLED
- The job cancellation was successful.
DONE
- The job has completed successfully.
ERROR
- The job has completed, but encountered an error.
ATTEMPT_FAILURE
- Job attempt has failed. The detail field contains failure details for this attempt.Applies to restartable jobs only.
state_start_time
Type: TIMESTAMP
Provider name: stateStartTime
Description: Output only. The time when this state was entered.
substate
Type: STRING
Provider name: substate
Description: Output only. Additional state information, which includes status reported by the agent.
Possible values:
UNSPECIFIED
- The job substate is unknown.
SUBMITTED
- The Job is submitted to the agent.Applies to RUNNING state.
QUEUED
- The Job has been received and is awaiting execution (it may be waiting for a condition to be met). See the ‘details’ field for the reason for the delay.Applies to RUNNING state.
STALE_STATUS
- The agent-reported status is out of date, which may be caused by a loss of communication between the agent and Dataproc. If the agent does not send a timely update, the job will fail.Applies to RUNNING state.
Type: UNORDERED_LIST_STRING
trino_job
Type: STRUCT
Provider name: trinoJob
Description: Optional. Job is a Trino job.
client_tags
Type: UNORDERED_LIST_STRING
Provider name: clientTags
Description: Optional. Trino client tags to attach to this query
continue_on_failure
Type: BOOLEAN
Provider name: continueOnFailure
Description: Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.
logging_config
Type: STRUCT
Provider name: loggingConfig
Description: Optional. The runtime log config for job execution.
output_format
Type: STRING
Provider name: outputFormat
Description: Optional. The format in which query output will be displayed. See the Trino documentation for supported output formats
query_file_uri
Type: STRING
Provider name: queryFileUri
Description: The HCFS URI of the script that contains SQL queries.
query_list
Type: STRUCT
Provider name: queryList
Description: A list of queries.
queries
Type: UNORDERED_LIST_STRING
Provider name: queries
Description: Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: “hiveJob”: { “queryList”: { “queries”: [ “query1”, “query2”, “query3;query4”, ] } }
yarn_applications
Type: UNORDERED_LIST_STRUCT
Provider name: yarnApplications
Description: Output only. The collection of YARN applications spun up by this job.Beta Feature: This report is available for testing purposes only. It may be changed before final release.
name
Type: STRING
Provider name: name
Description: Required. The application name.
progress
Type: FLOAT
Provider name: progress
Description: Required. The numerical progress of the application, from 1 to 100.
state
Type: STRING
Provider name: state
Description: Required. The application state.
Possible values:
STATE_UNSPECIFIED
- Status is unspecified.
NEW
- Status is NEW.
NEW_SAVING
- Status is NEW_SAVING.
SUBMITTED
- Status is SUBMITTED.
ACCEPTED
- Status is ACCEPTED.
RUNNING
- Status is RUNNING.
FINISHED
- Status is FINISHED.
FAILED
- Status is FAILED.
KILLED
- Status is KILLED.
tracking_url
Type: STRING
Provider name: trackingUrl
Description: Optional. The HTTP URL of the ApplicationMaster, HistoryServer, or TimelineServer that provides application-specific information. The URL uses the internal hostname, and requires a proxy server for resolution and, possibly, access.