이 페이지는 아직 한국어로 제공되지 않으며 번역 작업 중입니다. 번역에 관한 질문이나 의견이 있으시면 언제든지 저희에게 연락해 주십시오.

Observability Pipelines is not available on the US1-FED Datadog site.

OP Worker 버전 1.8 이하에서 버전 2.0 이상으로 업그레이드할 경우 기존 파이프라인에 오류가 발생합니다. OP Worker 버전이 1.8 이하이고 해당 버전을 계속 사용하고 싶을 경우에는 버전을 업그레이드하지 마세요. OP Worker 2.0 이상을 사용하고 싶을 경우에는 OP Worker 버전 1.8 이하의 파이프라인을 OP Worker 2.x.로 마이그레이션해야 합니다.

Datadog에서는 OP Worker 버전 2.0 이상으로 업그레이드하는 것을 권고합니다. 주 OP Worker 버전으로 업그레이드하고 업데이트해야 최신 OP Worker의 기능, 버그 수정, 보안 업데이트 서비스를 받을 수 있습니다.

Overview

The Observability Pipelines Worker can collect, process, and route logs from any source to any destination. Using Datadog, you can build and manage all of your Observability Pipelines Worker deployments at scale.

This guide walks you through deploying the Worker in your common tools cluster and configuring the Datadog Agent to send logs and metrics to the Worker.

A diagram of a couple of workload clusters sending their data through the Observability Pipelines aggregator.

Deployment Modes

관측 파이프라인용 원격 구성이 프라이빗 베타 서비스 중입니다. 서비스에 액세스하려면 Datadog 지원팀 이나 고객 성공 매니저에게 문의하세요.

원격 구성 프라이빗 베타에 등록하면 텍스트 편집기에서 파이프라인 구성을 업데이트한 후 수동으로 변경 사항을 출시하는 대신 변경 사항을 Datadog UI에서 작업자로 원격 출시할 수 있습니다. 파이프라인을 생성할 때 배포 방법을 선택하고 작업자를 설치하세요.

파이프라인을 배포한 후 배포 모드를 변경하는 방법에 관해서는 배포 모드 업데이트를 참고하세요.

Assumptions

  • You are already using Datadog and want to use Observability Pipelines.
  • You have administrative access to the clusters where the Observability Pipelines Worker is going to be deployed, as well as to the workloads that are going to be aggregated.
  • You have a common tools or security cluster for your environment to which all other clusters are connected.

Prerequisites

Before installing, make sure you have:

You can generate both of these in Observability Pipelines.

Provider-specific requirements

Ensure that your machine is configured to run Docker.

To run the Worker on your Kubernetes nodes, you need a minimum of two nodes with one CPU and 512MB RAM available. Datadog recommends creating a separate node pool for the Workers, which is also the recommended configuration for production deployments.

  • The EBS CSI driver is required. To see if it is installed, run the following command and look for ebs-csi-controller in the list:

    kubectl get pods -n kube-system
    
  • A StorageClass is required for the Workers to provision the correct EBS drives. To see if it is installed already, run the following command and look for io2 in the list:

    kubectl get storageclass
    

    If io2 is not present, download the StorageClass YAML and kubectl apply it.

  • The AWS Load Balancer controller is required. To see if it is installed, run the following command and look for aws-load-balancer-controller in the list:

    helm list -A
    
  • Datadog recommends using Amazon EKS >= 1.16.

See Best Practices for OPW Aggregator Architecture for production-level requirements.

To run the Worker on your Kubernetes nodes, you need a minimum of two nodes with one CPU and 512 MB RAM available. Datadog recommends creating a separate node pool for the Workers, which is also the recommended configuration for production deployments.

See Best Practices for OPW Aggregator Architecture for production-level requirements.

To run the Worker on your Kubernetes nodes, you need a minimum of two nodes with one CPU and 512MB RAM available. Datadog recommends creating a separate node pool for the Workers, which is also the recommended configuration for production deployments.

See Best Practices for OPW Aggregator Architecture for production-level requirements.

There are no provider-specific requirements for APT-based Linux.

There are no provider-specific requirements for RPM-based Linux.

In order to run the Worker in your AWS account, you need administrative access to that account. Collect the following pieces of information to run the Worker instances:

  • The VPC ID your instances will run in.
  • The subnet IDs your instances will run in.
  • The AWS region your VPC is located in.
CloudFormation installs only support Remote Configuration.
Only use CloudFormation installs for non-production-level workloads.

In order to run the Worker in your AWS account, you need administrative access to that account. Collect the following pieces of information to run the Worker instances:

  • The VPC ID your instances will run in.
  • The subnet IDs your instances will run in.
  • The AWS region your VPC is located in.

Installing the Observability Pipelines Worker

The Observability Pipelines Worker Docker image is published to Docker Hub here.

  1. Download the sample pipeline configuration file.

  2. Run the following command to start the Observability Pipelines Worker with Docker:

    docker run -i -e DD_API_KEY=<API_KEY> \
      -e DD_OP_PIPELINE_ID=<PIPELINE_ID> \
      -e DD_SITE=<SITE> \
      -p 8282:8282 \
      -v ./pipeline.yaml:/etc/observability-pipelines-worker/pipeline.yaml:ro \
      datadog/observability-pipelines-worker run
    

    Replace <API_KEY> with your Datadog API key, <PIPELINES_ID> with your Observability Pipelines configuration ID, and <SITE> with . ./pipeline.yaml must be the relative or absolute path to the configuration you downloaded in Step 1.

  1. Download the Helm chart values file for AWS EKS.

  2. In the Helm chart, replace the datadog.apiKey and datadog.pipelineId values to match your pipeline and use for the site value. Then, install it in your cluster with the following commands:

    helm repo add datadog https://helm.datadoghq.com
    
    helm repo update
    
    helm upgrade --install \
        opw datadog/observability-pipelines-worker \
        -f aws_eks.yaml
    
  1. Download the Helm chart values file for Azure AKS.

  2. In the Helm chart, replace the datadog.apiKey and datadog.pipelineId values to match your pipeline and use for the site value. Then, install it in your cluster with the following commands:

    helm repo add datadog https://helm.datadoghq.com
    
    helm repo update
    
    helm upgrade --install \
      opw datadog/observability-pipelines-worker \
      -f azure_aks.yaml
    
  1. Download the Helm chart values file for Google GKE.

  2. In the Helm chart, replace the datadog.apiKey and datadog.pipelineId values to match your pipeline and use for the site value. Then, install it in your cluster with the following commands:

    helm repo add datadog https://helm.datadoghq.com
    
    helm repo update
    
    helm upgrade --install \
      opw datadog/observability-pipelines-worker \
      -f google_gke.yaml
    
  1. Run the following commands to set up APT to download through HTTPS:

    sudo apt-get update
    sudo apt-get install apt-transport-https curl gnupg
    
  2. Run the following commands to set up the Datadog deb repo on your system and create a Datadog archive keyring:

    sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable observability-pipelines-worker-1' > /etc/apt/sources.list.d/datadog-observability-pipelines-worker.list"
    sudo touch /usr/share/keyrings/datadog-archive-keyring.gpg
    sudo chmod a+r /usr/share/keyrings/datadog-archive-keyring.gpg
    curl https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
    curl https://keys.datadoghq.com/DATADOG_APT_KEY_06462314.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
    curl https://keys.datadoghq.com/DATADOG_APT_KEY_F14F620E.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
    curl https://keys.datadoghq.com/DATADOG_APT_KEY_C0962C7D.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
    
  3. Run the following commands to update your local apt repo and install the Worker:

    sudo apt-get update
    sudo apt-get install observability-pipelines-worker datadog-signing-keys
    
  4. Add your keys and the site () to the Worker’s environment variables:

    sudo cat <<-EOF > /etc/default/observability-pipelines-worker
    DD_API_KEY=<API_KEY>
    DD_OP_PIPELINE_ID=<PIPELINE_ID>
    DD_SITE=<SITE>
    EOF
    
  5. Download the sample configuration file to /etc/observability-pipelines-worker/pipeline.yaml on the host.

  6. Start the worker:

    sudo systemctl restart observability-pipelines-worker
    
  1. Run the following commands to set up the Datadog rpm repo on your system:

    cat <<EOF > /etc/yum.repos.d/datadog-observability-pipelines-worker.repo
    [observability-pipelines-worker]
    name = Observability Pipelines Worker
    baseurl = https://yum.datadoghq.com/stable/observability-pipelines-worker-1/\$basearch/
    enabled=1
    gpgcheck=1
    repo_gpgcheck=1
    gpgkey=https://keys.datadoghq.com/DATADOG_RPM_KEY_CURRENT.public
           https://keys.datadoghq.com/DATADOG_RPM_KEY_4F09D16B.public
           https://keys.datadoghq.com/DATADOG_RPM_KEY_B01082D3.public
           https://keys.datadoghq.com/DATADOG_RPM_KEY_FD4BF915.public
           https://keys.datadoghq.com/DATADOG_RPM_KEY_E09422B3.public
    EOF
    

    Note: If you are running RHEL 8.1 or CentOS 8.1, use repo_gpgcheck=0 instead of repo_gpgcheck=1 in the configuration above.

  2. Update your packages and install the Worker:

    sudo yum makecache
    sudo yum install observability-pipelines-worker
    
  3. Add your keys and the site () to the Worker’s environment variables:

    sudo cat <<-EOF > /etc/default/observability-pipelines-worker
    DD_API_KEY=<API_KEY>
    DD_OP_PIPELINE_ID=<PIPELINE_ID>
    DD_SITE=<SITE>
    EOF
    
  4. Download the sample configuration file to /etc/observability-pipelines-worker/pipeline.yaml on the host.

  5. Start the worker:

    sudo systemctl restart observability-pipelines-worker
    
  1. Download the the sample configuration.
  2. Set up the Worker module in your existing Terraform using the sample configuration. Make sure to update the values in vpc-id, subnet-ids, and region to match your AWS deployment in the configuration. Also, update the values in datadog-api-key and pipeline-id to match your pipeline.
Only use CloudFormation installs for non-production-level workloads.

To install the Worker in your AWS Account, use the CloudFormation template to create a Stack:

  1. Download the CloudFormation template for the Worker.

  2. In the CloudFormation console, click Create stack, and select the With new resources (standard) option.

  3. Make sure that the Template is ready option is selected, and select Upload a template file. Click Choose file and add the CloudFormation template file you downloaded earlier. Click Next.

  4. Enter a name for the stack in Specify stack details.

  5. Fill in the parameters for the CloudFormation template. A few require special attention:

    • For APIKey and PipelineID, provide the key and ID that you gathered earlier in the Prerequisites section.

    • For the VPCID and SubnetIDs, provide the subnets and VPC you chose earlier.

    • All other parameters are set to reasonable defaults for a Worker deployment but you can adjust them for your use case as needed.

  6. Click Next.

  7. Review and make sure the parameters are as expected. Click the necessary permissions checkboxes for IAM, and click Submit to create the Stack.

CloudFormation handles the installation at this point; the Worker instances are launched, download the necessary software, and start running automatically.

See Configurations for more information about the source, transform, and sink used in the sample configuration. See Working with Data for more information on transforming your data.

Load balancing

Production-oriented setup is not included in the Docker instructions. Instead, refer to your company’s standards for load balancing in containerized environments. If you are testing on your local machine, configuring a load balancer is unnecessary.

Use the load balancers provided by your cloud provider. They adjust based on autoscaling events that the default Helm setup is configured for. The load balancers are internal-facing, so they are only accessible inside your network.

Use the load balancer URL given to you by Helm when you configure the Datadog Agent.

NLBs provisioned by the AWS Load Balancer Controller are used.

See Capacity Planning and Scaling for load balancer recommendations when scaling the Worker.

Cross-availability-zone load balancing

The provided Helm configuration tries to simplify load balancing, but you must take into consideration the potential price implications of cross-AZ traffic. Wherever possible, the samples try to avoid creating situations where multiple cross-AZ hops can happen.

The sample configurations do not enable the cross-zone load balancing feature available in this controller. To enable it, add the following annotation to the service block:

service.beta.kubernetes.io/aws-load-balancer-attributes: load_balancing.cross_zone.enabled=true

See AWS Load Balancer Controller for more details.

Use the load balancers provided by your cloud provider. They adjust based on autoscaling events that the default Helm setup is configured for. The load balancers are internal-facing, so they are only accessible inside your network.

Use the load balancer URL given to you by Helm when you configure the Datadog Agent.

See Capacity Planning and Scaling for load balancer recommendations when scaling the Worker.

Cross-availability-zone load balancing

The provided Helm configuration tries to simplify load balancing, but you must take into consideration the potential price implications of cross-AZ traffic. Wherever possible, the samples try to avoid creating situations where multiple cross-AZ hops can happen.

Use the load balancers provided by your cloud provider. They adjust based on autoscaling events that the default Helm setup is configured for. The load balancers are internal-facing, so they are only accessible inside your network.

Use the load balancer URL given to you by Helm when you configure the Datadog Agent.

See Capacity Planning and Scaling for load balancer recommendations when scaling the Worker.

Cross-availability-zone load balancing

The provided Helm configuration tries to simplify load balancing, but you must take into consideration the potential price implications of cross-AZ traffic. Wherever possible, the samples try to avoid creating situations where multiple cross-AZ hops can happen.

Global Access is enabled by default since that is likely required for use in a shared tools cluster.

No built-in support for load-balancing is provided, given the single-machine nature of the installation. You will need to provision your own load balancers using whatever your company’s standard is.

No built-in support for load-balancing is provided, given the single-machine nature of the installation. You will need to provision your own load balancers using whatever your company’s standard is.

An NLB is provisioned by the Terraform module, and configured to point at the instances. Its DNS address is returned in the lb-dns output in Terraform.

Only use CloudFormation installs for non-production-level workloads.

An NLB is provisioned by the CloudFormation template, and is configured to point at the AutoScaling Group. Its DNS address is returned in the LoadBalancerDNS CloudFormation output.

Buffering

Observability Pipelines includes multiple buffering strategies that allow you to increase the resilience of your cluster to downstream faults. The provided sample configurations use disk buffers, the capacities of which are rated for approximately 10 minutes of data at 10Mbps/core for Observability Pipelines deployments. That is often enough time for transient issues to resolve themselves, or for incident responders to decide what needs to be done with the observability data.

By default, the Observability Pipelines Worker’s data directory is set to /var/lib/observability-pipelines-worker. Make sure that your host machine has a sufficient amount of storage capacity allocated to the container’s mountpoint.

For AWS, Datadog recommends using the io2 EBS drive family. Alternatively, the gp3 drives could also be used.

For Azure AKS, Datadog recommends using the default (also known as managed-csi) disks.

For Google GKE, Datadog recommends using the premium-rwo drive class because it is backed by SSDs. The HDD-backed class, standard-rwo, might not provide enough write performance for the buffers to be useful.

By default, the Observability Pipelines Worker’s data directory is set to /var/lib/observability-pipelines-worker - if you are using the sample configuration, you should ensure that this has at least 288GB of space available for buffering.

Where possible, it is recommended to have a separate SSD mounted at that location.

By default, the Observability Pipelines Worker’s data directory is set to /var/lib/observability-pipelines-worker - if you are using the sample configuration, you should ensure that this has at least 288GB of space available for buffering.

Where possible, it is recommended to have a separate SSD mounted at that location.

By default, a 288GB EBS drive is allocated to each instance, and the sample configuration above is set to use that for buffering.

EBS drives created by this CloudFormation template have their lifecycle tied to the instance they are created with. This leads to data loss if an instance is terminated, for example by the AutoScaling Group. For this reason, only use CloudFormation installs for non-production-level workloads.

By default, a 288GB EBS drive is allocated to each instance, and is auto-mounted and formatted upon instance boot.

Connect the Datadog Agent to the Observability Pipelines Worker

To send Datadog Agent logs to the Observability Pipelines Worker, update your agent configuration with the following:

observability_pipelines_worker:
  logs:
    enabled: true
    url: "http://<OPW_HOST>:8282"

OPW_HOST is the IP of the load balancer or machine you set up earlier. For single-host Docker-based installs, this is the IP address of the underlying host. For Kubernetes-based installs, you can retrieve it by running the following command and copying the EXTERNAL-IP:

kubectl get svc opw-observability-pipelines-worker

For Terraform installs, the lb-dns output provides the necessary value. For CloudFormation installs, the LoadBalancerDNS CloudFormation output has the correct URL to use.

At this point, your observability data should be going to the Worker and is available for data processing.

Updating deployment modes

파이프라인을 배포한 후 배포 방법을 변경할 수 있습니다. 예를 들어 수동 관리형 파이프라인에서 원격 구성이 활성화된 파이프라인으로 변경하거나 그 반대 방향으로도 바꿀 수 있습니다.

원격 구성 배포에서 수동 관리형 배포로 바꾸는 방법:

  1. Observability Pipeline으로 이동해 파이프라인을 선택하세요.
  2. 톱니바퀴 아이콘을 클릭해 설정으로 이동하세요.
  3. Deployment Mode에서 manual을 선택해 활성화하세요.
  4. DD_OP_REMOTE_CONFIGURATION_ENABLED 플래그를 false로 설정하고 작업자를 재시작하세요. 이 플래그로 작업자를 재시작하지 않으면 원격 구성이 활성화된 상태로 계속 진행되며, 작업자가 로컬 구성 파일을 통해 수동으로 업데이트되지 않습니다.

수동 관리형 배포에서 원격 구성 배포로 바꾸는 방법:

  1. Observability Pipeline으로 이동해 파이프라인을 선택하세요.
  2. 톱니바퀴 아이콘을 클릭해 설정으로 이동하세요.
  3. Deployment Mode에서 Remote Configuration을 선택해 활성화하세요.
  4. DD_OP_REMOTE_CONFIGURATION_ENABLED 플래그를 true로 설정하고 작업자를 재시작하세요. 이 플래그로 작업자를 재시작해야 UI에서 배포된 구성으로 폴링됩니다.
  5. 작업자가 새 버전 구성을 받도록 버전 내역에서 버전 하나를 배포하세요. 버전을 클릭하세요. Edit as Draft를 선택한 후 Deploy를 클릭하세요.

Working with data

The sample configuration provided has example processing steps that demonstrate Observability Pipelines tools and ensures that data sent to Datadog is in the correct format.

Processing logs

The sample Observability Pipelines configuration does the following:

  • Collects logs sent from the Datadog agent to the Observability Pipelines Worker.
  • Tags logs coming through the Observability Pipelines Worker. This helps determine what traffic still needs to be shifted over to the Worker as you update your clusters. These tags also show you how logs are being routed through the load balancer, in case there are imbalances.
  • Corrects the status of logs coming through the Worker. Due to how the Datadog Agent collects logs from containers, the provided .status attribute does not properly reflect the actual level of the message. It is removed to prevent issues with parsing rules in the backend, where logs are received from the Worker.

The following are two important components in the example configuration:

  • logs_parse_ddtags: Parses the tags that are stored in a string into structured data.
  • logs_finish_ddtags: Re-encodes the tags so that it is in the format as how the Datadog Agent would send it.

Internally, the Datadog Agent represents log tags as a CSV in a single string. To effectively manipulate these tags, they must be parsed, modified, and then re-encoded before they are sent to the ingest endpoint. These steps are written to automatically perform those actions for you. Any modifications you make to the pipeline, especially for manipulating tags, should be in between these two steps.

At this point, your environment is configured for Observability Pipelines with data flowing through it. Further configuration is likely required for your specific use cases, but the tools provided give you a starting point.

Further reading

PREVIEWING: rtrieu/product-analytics-ui-changes