Pipeline Data Model And Execution Types
CI Visibility is not available in the selected site ().
Overview
This guide describes how to programmatically set up pipeline executions in CI Visibility and defines the types of pipeline execution that CI Visibility supports.
This guide applies to pipelines created using the public CI Visibility Pipelines API. Integrations with other CI providers may vary.
Data model
Pipeline executions are modeled as traces, similar to an APM distributed trace, where spans represent the execution of different parts of the pipeline. The CI Visibility data model for representing pipeline executions consists of four levels:
Level Name | Description |
---|
Pipeline (required) | The top-level root span that contains all other levels as children. It represents the overall execution of a pipeline from start to finish. This level is sometimes called build or workflow in some CI providers. |
Stage | Serves as a grouping of jobs under a user-defined name. Some CI providers do not have this level. |
Job | The smallest unit of work where commands are executed. All tasks at this level should be performed on a single node. |
Step | In some CI providers, this level represents a shell script or an action executed within a job. |
When a pipeline, stage, job, or step finishes, execution data is sent to Datadog. To set up Pipeline Visibility, see the list of supported CI providers. If your CI provider or workflow is not supported, you can use the public API endpoint to send your pipeline executions to CI Visibility.
Stages, jobs, and steps are expected to have the exact same pipeline name as their parent pipeline. In the case of a mismatch, some pipelines may be missing stage, job, and step information. For example, missing jobs in the job summary tables.
Pipeline unique IDs
All pipeline executions within a level must have an unique identifier. For example, a pipeline and a job may have the same unique ID, but not two pipelines.
When sending repeated IDs with different timestamps, the user interface may exhibit undesirable behavior. For example, flame graphs may display span tags from a different pipeline execution. If duplicate IDs with the same timestamps are sent, only the values of the last pipeline execution received are stored.
Pipeline execution types
Normal run
The normal run of a pipeline execution follows the flow depicted below:
Depending on the provider, some levels may be missing. For example, stages may not exist, and jobs may run in parallel or sequence, or a combination of both.
After the completion of each component, a payload must be sent to Datadog with all the necessary data to represent the execution. Datadog processes this data, stores it as a pipeline event, and displays it in CI Visibility. Pipeline executions must end before sending them to Datadog.
Full retries
Full retries of a pipeline must have different pipeline unique IDs.
In the public API endpoint, you can populate the previous_attempt
field to link to previous retries. Retries are treated as separate pipeline executions in Datadog, and the start and end time should only encompass that retry.
Partial retries
When retrying a subset of jobs within a pipeline, you must send a new pipeline event with a new pipeline unique ID. The payload for any new jobs must be linked to the new pipeline unique ID. To link them to the previous retry, add the previous_attempt
field.
Partial retries are treated as separate pipelines as well. The start and end time must not include the time of the original retry. For a partial retry, do not send payloads for jobs that ran in the previous attempt. Also, set the partial_retry
field to true
on partial retries to exclude them from aggregation when calculating run times.
For example, a pipeline named P
has three different jobs, namely J1
, J2
, and J3
, executed sequentially. In the first run of P
, only J1
and J2
are executed, and J2
fails.
Therefore, you need to send a total of three payloads:
- Job payload for
J1
, with ID J1_1
and pipeline ID P_1
. - Job payload for
J2
, with ID J2_1
and pipeline ID P_1
. - Pipeline payload for
P
, with ID P_1
.
Suppose there is a partial retry of the pipeline starting from J2
, where all the remaining jobs succeed.
You need to send three additional payloads:
- Job payload for
J2
, with ID J2_2
and pipeline ID P_2
. - Job payload for
J3
, with ID J3_1
and pipeline ID P_2
. - Pipeline payload for
P
, with ID P_2
.
The actual values of the IDs are not important. What matters is that they are correctly modified based on the pipeline run as specified above.
Blocked pipelines
If a pipeline is indefinitely blocked due to requiring manual intervention, a pipeline event payload must be sent as soon as the pipeline reaches the blocked state. The pipeline status must be set to blocked
.
The remaining pipeline data must be sent in separate payloads with a different pipeline unique ID. In the second pipeline, you can set is_resumed
to true
to signal that the execution was resumed from a blocked pipeline.
Downstream pipelines
If a pipeline is triggered as a child of another pipeline, the parent_pipeline
field should be populated with details of the parent pipeline.
Manual pipelines
If a pipeline is triggered manually, the is_manual
field must be set to true.
Providing Git information of the commit that triggered the pipeline execution is strongly encouraged. Pipeline executions without Git information don’t appear on the Recent Code Changes page. At a minimum, the repository URL, commit SHA, and author email are required. For more information, see the public API endpoint specification.
Further reading
Additional helpful documentation, links, and articles: