(LEGACY) High Availability and Disaster Recovery
Observability Pipelines is not available on the US1-FED Datadog site.
This guide is for large-scale production-level deployments.
In the context of Observability Pipelines, high availability refers to the Observability Pipelines Worker remaining available if there are any system issues.
To achieve high availability:
- Deploy at least two Observability Pipelines Worker instances in each Availability Zone.
- Deploy Observability Pipelines Worker in at least two Availability Zones.
- Front your Observability Pipelines Worker instances with a load balancer that balances traffic across Observability Pipelines Worker instances. See Capacity Planning and Scaling for more information.
Mitigating failure scenarios
Handling Observability Pipelines Worker process issues
To mitigate a system process issue, distribute the Observability Pipelines Worker across multiple nodes and front them with a network load balancer that can redirect traffic to another Observability Pipelines Worker instance as needed. In addition, platform-level automated self-healing should eventually restart the process or replace the node.
Mitigating node failures
To mitigate node issues, distribute the Observability Pipelines Worker across multiple nodes and front them with a network load balancer that can redirect traffic to another Observability Pipelines Worker node. In addition, platform-level automated self-healing should eventually replace the node.
Handling availability zone failures
To mitigate issues with availability zones, deploy the Observability Pipelines Worker across multiple availability zones.
Mitigating region failures
Observability Pipelines Worker is designed to route internal observability data, and it should not failover to another region. Instead, Observability Pipelines Worker should be deployed in all of your regions. Therefore, if your entire network or region fails, Observability Pipelines Worker would fail with it. See Networking for more information.
Disaster recovery
Internal disaster recovery
Observability Pipelines Worker is an infrastructure-level tool designed to route internal observability data. It implements a shared-nothing architecture and does not manage state that should be replicated or transferred to a disaster recovery (DR) site. Therefore, if your entire region fails, Observability Pipelines Worker would fail with it. Therefore, you should install the Observability Pipelines Worker in your DR site as part of your broader DR plan.
External disaster recovery
If you’re using a managed destination, such as Datadog, Observability Pipelines Worker can facilitate automatic routing of data to your Datadog DR site using Observability Pipelines Worker’s circuit breaker feature.