Use CI jobs failure analysis to identify root causes in failed jobs
Overview
This guide explains how to use CI jobs failure analysis to determine the most common root cause of failed CI jobs. This can help improve the user experience with CI pipelines.
Understanding CI jobs failure analysis
CI Visibility uses an LLM model to generate enhanced error messages and categorize them with a domain and subdomain, based on the relevant logs collected from every failed CI job.
How does CI Visibility identify the relevant logs of a CI job?
CI Visibility considers that a log line is relevant when it has not appeared in the logs collected from the previous successful executions of that CI job. Log relevancy is only computed for logs coming from failed CI jobs.
You can check if a log line has been considered as relevant by using the @relevant:true
tag in the Log Explorer.
If a failed CI job has relevant logs, the LLM model uses the last 100 relevant log lines as input. If a failed CI job does not have relevant logs, CI Visibility sends the last 100 log lines.
Each log line is pre-scanned to redact any potentially sensitive information before being used.
The LLM model can classify errors with similar messages into distinct yet related subdomains. For example, if the error message is Cannot connect to docker daemon
, it is usually categorized under domain:platform
and subdomain:network
. However, the LLM model may sometimes classify it under subdomain:infrastructure
instead.
Domains and Subdomains
Errors are categorized with a domain and subdomain:
Domains
Domain | Description |
---|
code | Failures caused by the code being built and tested in the CI pipeline. They should be fixed by the developer that modified the code. |
platform | Failures caused by reasons external to the code being built and tested. These failures can come from the CI provider, the underlying infrastructure, or external dependencies. They are not related to the developer code changes and should often be fixed by the team owning the whole CI system. |
unknown | Used when the logs do not reveal a clear root cause of job failure. |
Subdomains
Click on a domain tab to see the correspondent subdomains:
Subdomain | Cause | Examples |
---|
build | Compilation or build errors. | Compilation error in processor_test.go:28:50 |
test | Test failures. | 7 failed tests. Error: Can't find http.request.headers.x-amzn-trace-id in span's meta. |
quality | Format or linting failures. | Detected differences in files after running 'go fmt'. To fix, run 'go fmt' on the affected files and commit the changes. |
security | Security violations. | Security violation: Use of weak SHA1 hash for security. Consider usedforsecurity=False. |
Subdomain | Cause | Examples |
---|
assembly | Errors in artifacts generation or assembly errors during a script execution. | Artifact generation failed due to rejected file 'domains/backend/cart-shopping-proto/mod.info' that exists in the repository. |
deployment | Errors during deployments, or related to deployments configurations. | Subprocess command returned non-zero exit status 1 during deployment config generation. |
infrastructure | Errors related to the infrastructure on which the job was executed. | Invalid docker image reference format for tag 'registry.gitlab.com/cart-shopping/infrastructure/backend-deploy-image:AE/create-kubectl-image'. |
network | Errors on connectivity with other dependencies. | Connection refused when accessing localhost:8080. |
credentials | Errors on authentication; missing or wrong credentials. | Failed to get image auth for docker.elastic.co. No credentials found. Unable to pull image 'docker.elastic.co/elasticsearch/elasticsearch:7.17.24'. |
dependencies | Errors on installing or updating dependencies required to execute the job. | Package 'systemd-container' cannot be installed. Depends on 'libsystemd-shared' v255.4-1ubuntu8.4 but v255.4-1ubuntu8.5 is to be installed. |
git | Errors executing git commands. | Automatic merge failed due to conflicts between branches 'cart-shopping-new-feature' and 'staging'. |
checks | Errors on required fulfillment of checks during the CI job execution. | Release note not found during changelog validation |
setup | Errors on setting up the CI job. | Execution failed during the TLS setup or client dialing process. |
script | Syntactic errors in the script in the CI job. | No tests ran due to file or directory not found. |
Subdomain | Description | Example |
---|
unknown | Error could not be categorized. | Job failed with exit code 1. View full logs or trace. |
Supported CI providers
CI jobs failure analysis is available for the following CI providers:
Note: You must enable CI job logs collection, and the logs need to be indexed. To set up CI job logs collection, select your CI provider on Pipeline Visibility and follow the instructions to collect job logs.
If you are interested in CI jobs failure analysis but your CI provider is not supported yet, fill out
this form.
Identify the most recurrent errors in your CI pipelines
Using the CI Health page
CI Health provides a high-level overview of the health and performance of your CI pipelines. It helps DevOps and engineering teams monitor CI jobs, detect failures, and optimize build performance.
On this page, you can see a breakdown of the errors in your CI pipelines split by error domain. Click on a CI pipeline, and check the Breakdown
column in the Failed Executions
section.
Using facets
Use the facets @error.message
, @error.domain
, and @error.subdomain
to identify the most recurrent errors in your CI pipelines. Using those facets, you can create custom dashboards and notebooks.
These facets are only available when using the ci_level:job
in a query. If the CI jobs failures analysis can’t be computed (for example, if you are not using a supported CI provider), these facets will contain the error information coming from the CI provider.
Using the dashboard template
You can import the CI Visibility - CI Jobs Failure Analysis dashboard template:
- Open the civisibility-ci-jobs-failure-analysis-dashboard.json dashboard template and copy the contents into the clipboard.
- Create a New Dashboard in Datadog.
- Paste the copied content into the new dashboard.
- Save the dashboard.
Further reading
Additional helpful documentation, links, and articles: