Redpanda

Supported OS Linux Windows Mac OS

통합 버전2.0.0

개요

Redpanda는 필수 워크로드용 Kafka API-호환 스트리밍 플랫폼입니다.

Datadog을 Redpanda와 연결하여 키 메트릭를 확인하고 특정 사용자 요구에 따라 추가 메트릭 그룹을 추가합니다.

설정

설치

  1. Datadog 에이전트를 다운로드하여 실행합니다.
  2. Redpanda 통합을 수동 설치합니다. 환경에 따른 자세한 내용은 커뮤니티 통합 사용을 참조하세요.

호스트

호스트에서 실행되는 에이전트에 대해 이 점검을 설정하려면 datadog-agent integration install -t datadog-redpanda==<INTEGRATION_VERSION>을 실행하세요.

컨테이너화

컨테이너화된 환경의 경우 도커(Docker) 에이전트로 본 통합을 사용하는 가장 좋은 방법은 Redpanda 통합이 설치된 에이전트를 빌드하는 것입니다.

에이전트의 업데이트 버전을 빌드하려면,

  1. 다음 Dockerfile을 사용합니다.
FROM gcr.io/datadoghq/agent:latest

ARG INTEGRATION_VERSION=2.0.0

RUN agent integration install -r -t datadog-redpanda==${INTEGRATION_VERSION}
  1. 이미지를 빌드하여 비공개 도커(Docker) 레지스트리에 푸시합니다.

  2. Datadog 에이전트 컨테이너 이미지를 업그레이드합니다. Helm 차트를 사용하는 경우 values.yaml 파일의 agents.image 섹션을 수정하여 기본 에이전트 이미지를 대체합니다.

agents:
  enabled: true
  image:
    tag: <NEW_TAG>
    repository: <YOUR_PRIVATE_REPOSITORY>/<AGENT_NAME>
  1. values.yaml 파일을 사용하여 에이전트를 업그레이드합니다.
helm upgrade -f values.yaml <RELEASE_NAME> datadog/datadog

설정

호스트

메트릭 수집

Redpanda 성능 데이터 수집을 시작하려면,

  1. 에이전트 설정 디렉토리 루트에 있는 conf.d/ 폴더에서 redpanda.d/conf.yaml 파일을 편집하세요. 사용 가능한 모든 설정 옵션은 redpanda.d/conf.yaml.example 샘플을 참조하세요.

  2. 에이전트를 재시작하세요.

로그 수집

Datadog 에이전트에서 로그 수집은 기본적으로 비활성화되어 있습니다. 로그 수집은 에이전트 v6.0 이상에서 사용할 수 있습니다.

  1. 로그를 활성화하려면 datadog.yaml 파일에 다음을 추가합니다.

    logs_enabled: true
    
  2. dd-agent 사용자가 systemd-journal 그룹의 멤버인지 확인하고, 그렇지 않은 경우 루트 권한으로 다음 명령을 실행합니다.

    usermod -a -G systemd-journal dd-agent
    
  3. redpanda.d/conf.yaml 파일에 다음을 추가하여 Redpanda 로그 수집을 시작합니다.

     logs:
     - type: journald
       source: redpanda
    

컨테이너화

메트릭 수집

컨테이너화된 환경에서 자동탐지는 기본적으로 Redpanda 점검이 Datadog 에이전트 이미지에 통합된 후 설정됩니다.

메트릭은 Datadog 서버에 자동 수집됩니다. 자세한 내용은 자동탐지 통합 템플릿을 참조하세요.

로그 수집

Datadog 에이전트에서 로그 수집은 기본적으로 비활성화되어 있습니다. 로그 수집은 에이전트 v6.0 이상에서 사용할 수 있습니다.

로그 수집을 활성화하려면 쿠버네티스 로그 수집을 참조하세요.

파라미터
<LOG_CONFIG>{"source": "redpanda", "service": "redpanda_cluster"}

검증

에이전트 상태 하위 명령을 실행하고 점검 섹션에서 redpanda를 찾습니다.

수집한 데이터

메트릭

redpanda.application.build
(gauge)
Redpanda build information
redpanda.application.uptime
(gauge)
Redpanda uptime in seconds
Shown as second
redpanda.controller.log_limit_requests_available
(gauge)
Controller log rate limiting. Available rps for group
Shown as request
redpanda.controller.log_limit_requests_dropped
(count)
Controller log rate limiting. Amount of requests that are dropped due to exceeding limit in group
Shown as request
redpanda.partitions.moving_from_node
(gauge)
Amount of partitions that are moving from node
redpanda.partitions.moving_to_node
(gauge)
Amount of partitions that are moving to node
redpanda.partitions.node_cancelling_movements
(gauge)
Amount of cancelling partition movements for node
redpanda.reactor.cpu_busy_seconds
(gauge)
Total CPU busy time in seconds
Shown as second
redpanda.io_queue.total_read_ops
(count)
Total read operations passed in the queue
Shown as operation
redpanda.io_queue.total_write_ops
(count)
Total write operations passed in the queue
Shown as operation
redpanda.kafka.group_offset
(gauge)
Consumer group committed offset
redpanda.kafka.group_count
(gauge)
Number of consumers in a group
redpanda.kafka.group_topic_count
(gauge)
Number of topics in a group
redpanda.cluster.partitions
(gauge)
Configured number of partitions for the topic
redpanda.cluster.replicas
(gauge)
Configured number of replicas for the topic
redpanda.kafka.request_latency_seconds
(gauge)
Internal latency of kafka produce requests
Shown as second
redpanda.kafka.under_replicated_replicas
(gauge)
Number of under replicated replicas (i.e. replicas that are live, but not at the latest offest)
redpanda.memory.allocated_memory
(gauge)
Allocated memory size in bytes
Shown as byte
redpanda.memory.available_memory_low_water_mark
(gauge)
The low-water mark for available_memory from process start
Shown as byte
redpanda.memory.available_memory
(gauge)
Total shard memory potentially available in bytes (free_memory plus reclaimable)
Shown as byte
redpanda.memory.free_memory
(gauge)
Free memory size in bytes
Shown as byte
redpanda.node_status.rpcs_received
(gauge)
Number of node status RPCs received by this node
Shown as request
redpanda.node_status.rpcs_sent
(gauge)
Number of node status RPCs sent by this node
Shown as request
redpanda.node_status.rpcs_timed_out
(gauge)
Number of timed out node status RPCs from this node
Shown as request
redpanda.raft.leadership_changes
(count)
Number of leadership changes across all partitions of a given topic
redpanda.raft.recovery_bandwidth
(gauge)
Bandwidth available for partition movement. bytes/sec
redpanda.pandaproxy.request_errors
(count)
Total number of rest_proxy server errors
Shown as error
redpanda.pandaproxy.request_latency
(gauge)
Internal latency of request for rest_proxy
Shown as millisecond
redpanda.rpc.active_connections
(gauge)
Count of currently active connections
Shown as connection
redpanda.rpc.request_errors
(count)
Number of rpc errors
Shown as error
redpanda.rpc.request_latency_seconds
(gauge)
RPC latency
Shown as second
redpanda.scheduler.runtime_seconds
(count)
Accumulated runtime of task queue associated with this scheduling group
Shown as second
redpanda.schema_registry.errors
(count)
Total number of schema_registry server errors
Shown as error
redpanda.schema_registry_latency_seconds
(gauge)
Internal latency of request for schema_registry
Shown as second
redpanda.storage.disk_free_bytes
(count)
Disk storage bytes free.
Shown as byte
redpanda.storage.disk_free_space_alert
(gauge)
Status of low storage space alert. 0-OK, 1-Low Space 2-Degraded
redpanda.storage.disk_total_bytes
(count)
Total size of attached storage, in bytes.
Shown as byte
redpanda.cloud.client_backoff
(count)
Total number of requests that backed off
redpanda.cloud.client_download_backoff
(count)
Total number of download requests that backed off
redpanda.cloud.client_downloads
(count)
Total number of requests that downloaded an object from cloud storage
redpanda.cloud.client_not_found
(count)
Total number of requests for which the object was not found
redpanda.cloud.client_upload_backoff
(count)
Total number of upload requests that backed off
redpanda.cloud.client_uploads
(count)
Total number of requests that uploaded an object to cloud storage
redpanda.cloud.storage.active_segments
(gauge)
Number of remote log segments currently hydrated for read
redpanda.cloud.storage.cache_op_hit
(count)
Number of get requests for objects that are already in cache.
redpanda.cloud.storage.op_in_progress_files
(gauge)
Number of files that are being put to cache.
redpanda.cloud.storage.cache_op_miss
(count)
Number of get requests that are not satisfied from the cache.
redpanda.cloud.storage.op_put
(count)
Number of objects written into cache.
Shown as operation
redpanda.cloud.storage.cache_space_files
(gauge)
Number of objects in cache.
redpanda.cloud.storage.cache_space_size_bytes
(gauge)
Sum of size of cached objects.
Shown as byte
redpanda.cloud.storage.deleted_segments
(count)
Number of segments that have been deleted from S3 for the topic. This may grow due to retention or non compacted segments being replaced with their compacted equivalent.
redpanda.cloud.storage.errors
(count)
Number of transmit errors
Shown as error
redpanda.cloud.storage.housekeeping.drains
(gauge)
Number of times upload housekeeping queue was drained
redpanda.cloud.storage.housekeeping.jobs_completed
(count)
Number of executed housekeeping jobs
redpanda.cloud.storage.housekeeping.jobs_failed
(count)
Number of failed housekeeping jobs
Shown as error
redpanda.cloud.storage.housekeeping.jobs_skipped
(count)
Number of skipped housekeeping jobs
redpanda.cloud.storage.housekeeping.pauses
(gauge)
Number of times upload housekeeping was paused
redpanda.cloud.storage.housekeeping.resumes
(gauge)
Number of times upload housekeeping was resumed
redpanda.cloud.storage.housekeeping.rounds
(count)
Number of upload housekeeping rounds
redpanda.cloud.storage.jobs.cloud_segment_reuploads
(gauge)
Number of segment reuploads from cloud storage sources (cloud storage cache or direct download from cloud storage)
redpanda.cloud.storage.jobs.local_segment_reuploads
(gauge)
Number of segment reuploads from local data directory
redpanda.cloud.storage.jobs.manifest_reuploads
(gauge)
Number of manifest reuploads performed by all housekeeping jobs
redpanda.cloud.storage.jobs.metadata_syncs
(gauge)
Number of archival configuration updates performed by all housekeeping jobs
redpanda.cloud.storage.jobs.segment_deletions
(gauge)
Number of segments deleted by all housekeeping jobs
redpanda.cloud.storage.readers
(gauge)
Total number of segments pending deletion from the cloud for the topic
redpanda.cloud.storage.segments
(gauge)
Total number of uploaded bytes for the topic
redpanda.cloud.storage.segments_pending_deletion
(gauge)
Number of read cursors for hydrated remote log segments
redpanda.cloud.storage.uploaded_bytes
(count)
Total number of accounted segments in the cloud for the topic
Shown as byte
redpanda.cluster.brokers
(gauge)
Number of configured brokers in the cluster
redpanda.cluster.controller_log_limit_requests_dropped
(count)
Controller log rate limiting. Amount of requests that are dropped due to exceeding limit in group
redpanda.cluster.partition_num_with_broken_rack_constraint
(gauge)
Number of partitions that don't satisfy the rack awareness constraint
redpanda.cluster.topics
(gauge)
Number of topics in the cluster
redpanda.cluster.unavailable_partitions
(gauge)
Number of partitions that lack quorum among replicants
redpanda.kafka.partition_committed_offset
(gauge)
Latest committed offset for the partition (i.e. the offset of the last message safely persisted on most replicas)
redpanda.kafka.partitions
(gauge)
Configured number of partitions for the topic
redpanda.kafka.replicas
(gauge)
Configured number of replicas for the topic
redpanda.kafka.request_bytes
(count)
Total number of bytes produced per topic
Shown as byte

이벤트

Redpanda 통합에는 이벤트가 포함되어 있지 않습니다.

서비스 점검

redpanda.openmetrics.health
Returns CRITICAL if the check cannot access the metrics endpoint. Returns OK otherwise.
Statuses: ok, critical

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.

PREVIEWING: guacbot/translation-pipeline