- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
This check monitors Velero through the Datadog Agent. It collects data about Velero’s backup, restore and snapshot operations. This allows users to gain insight into the health, performance and reliability of their disaster recovery processes.
The Velero check is included in the Datadog Agent package. No additional installation is needed on your server.
Follow the instructions below to install and configure this check for an Agent running on a host.
Edit the velero.d/conf.yaml
file, in the conf.d/
folder at the root of your Agent’s configuration directory to start collecting your Velero performance data. See the sample velero.d/conf.yaml for all available configuration options.
See the Autodiscovery Integration Templates for guidance on configuring this integration in a containerized environment.
Note that two types of pods need to be queried for all metrics to be collected: velero
and node-agent
Therefore, make sure to update the annotations of the velero
deployment as well as the node-agent
daemonset.
The Velero integration can collect logs from the Velero pods.
To collect logs from Velero containers on a host:
Collecting logs is disabled by default in the Datadog Agent. Enable it in your datadog.yaml
file:
logs_enabled: true
Uncomment and edit the logs configuration block in your velero.d/conf.yaml
file. For example:
logs:
- type: docker
source: velero
service: velero
To collect logs from a Velero Kubernetes deployment:
Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.
Set Log Integrations as pod annotations. This can also be configured with a file, a ConfigMap, or a key-value store. For more information, see the configuration section of Kubernetes Log Collection.
Run the Agent’s status subcommand and look for velero
under the Checks section.
velero.backup.amount (gauge) | Current number of existent backups |
velero.backup.attempt.count (count) | Total number of attempted backups |
velero.backup.deletion.attempt.count (count) | Total number of attempted backup deletions |
velero.backup.deletion.failure.count (count) | Total number of failed backup deletions |
velero.backup.deletion.success.count (count) | Total number of successful backup deletions |
velero.backup.duration.seconds.bucket (count) | Bucket for time taken to complete backup, in seconds |
velero.backup.duration.seconds.count (count) | Count aggregation for time taken to complete backup |
velero.backup.duration.seconds.sum (count) | Cumulative sum of time taken to complete backup, in seconds Shown as second |
velero.backup.failure.count (count) | Total number of failed backups |
velero.backup.items (gauge) | Total number of items backed up |
velero.backup.items.errors (gauge) | Total number of errors encountered during backup Shown as error |
velero.backup.last_status (gauge) | Last status of the backup. A value of 1 is success, 0 is failure |
velero.backup.last_successful_timestamp (gauge) | Last time a backup ran successfully, Unix timestamp in seconds |
velero.backup.partial_failure.count (count) | Total number of partially failed backups |
velero.backup.success.count (count) | Total number of successful backups |
velero.backup.tarball_size_bytes (gauge) | Size, in bytes, of a backup Shown as byte |
velero.backup.validation_failure.count (count) | Total number of validation failed backups |
velero.backup.warning.count (count) | Total number of warned backups |
velero.csi_snapshot.attempt.count (count) | Total number of CSI attempted volume snapshots |
velero.csi_snapshot.failure.count (count) | Total number of CSI failed volume snapshots |
velero.csi_snapshot.success.count (count) | Total number of CSI successful volume snapshots |
velero.pod_volume.backup.dequeue.count (count) | Total number of podvolumebackup objects dequeued |
velero.pod_volume.backup.enqueue.count (count) | Total number of podvolumebackup objects enqueued |
velero.pod_volume.data.download.cancel.count (count) | Total number of canceled downloaded snapshots |
velero.pod_volume.data.download.failure.count (count) | Total number of failed downloaded snapshots |
velero.pod_volume.data.download.success.count (count) | Total number of successful downloaded snapshots |
velero.pod_volume.data.upload.cancel.count (count) | Total number of canceled uploaded snapshots |
velero.pod_volume.data.upload.failure.count (count) | Total number of failed uploaded snapshots |
velero.pod_volume.data.upload.success.count (count) | Total number of successful uploaded snapshots |
velero.pod_volume.operation_latency.seconds.bucket (count) | Histogram bucket for time taken to complete pod volume operations, in seconds |
velero.pod_volume.operation_latency.seconds.count (count) | Count aggregation for time taken to complete pod volume operations |
velero.pod_volume.operation_latency.seconds.gauge (gauge) | Gauge metric indicating time taken, in seconds, to perform pod volume operations Shown as second |
velero.pod_volume.operation_latency.seconds.sum (count) | Sum aggregation for time taken to complete pod volume operations, in seconds Shown as second |
velero.restore.amount (gauge) | Current number of existent restores |
velero.restore.attempt.count (count) | Total number of attempted restores |
velero.restore.failed.count (count) | Total number of failed restores |
velero.restore.partial_failure.count (count) | Total number of partially failed restores |
velero.restore.success.count (count) | Total number of successful restores |
velero.restore.validation_failed.count (count) | Total number of failed restores failing validations |
velero.volume_snapshot.attempt.count (count) | Total number of attempted volume snapshots |
velero.volume_snapshot.failure.count (count) | Total number of failed volume snapshots |
velero.volume_snapshot.success.count (count) | Total number of successful volume snapshots |
The Velero integration does not include any events.
The Velero integration does not include any service checks.
Make sure that your Velero server is exposing metrics by checking that the feature is enabled in the deployment configuration:
# Settings for Velero's prometheus metrics. Enabled by default.
metrics:
enabled: true
scrapeInterval: 30s
scrapeTimeout: 10s
Need help? Contact Datadog support.