- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
Redpanda는 필수 워크로드용 Kafka API-호환 스트리밍 플랫폼입니다.
Datadog을 Redpanda와 연결하여 키 메트릭를 확인하고 특정 사용자 요구에 따라 추가 메트릭 그룹을 추가합니다.
호스트에서 실행되는 에이전트에 대해 이 점검을 설정하려면 datadog-agent integration install -t datadog-redpanda==<INTEGRATION_VERSION>
을 실행하세요.
컨테이너화된 환경의 경우 도커(Docker) 에이전트로 본 통합을 사용하는 가장 좋은 방법은 Redpanda 통합이 설치된 에이전트를 빌드하는 것입니다.
에이전트의 업데이트 버전을 빌드하려면,
FROM gcr.io/datadoghq/agent:latest
ARG INTEGRATION_VERSION=2.0.0
RUN agent integration install -r -t datadog-redpanda==${INTEGRATION_VERSION}
이미지를 빌드하여 비공개 도커(Docker) 레지스트리에 푸시합니다.
Datadog 에이전트 컨테이너 이미지를 업그레이드합니다. Helm 차트를 사용하는 경우 values.yaml
파일의 agents.image
섹션을 수정하여 기본 에이전트 이미지를 대체합니다.
agents:
enabled: true
image:
tag: <NEW_TAG>
repository: <YOUR_PRIVATE_REPOSITORY>/<AGENT_NAME>
values.yaml
파일을 사용하여 에이전트를 업그레이드합니다.helm upgrade -f values.yaml <RELEASE_NAME> datadog/datadog
Redpanda 성능 데이터 수집을 시작하려면,
에이전트 설정 디렉토리 루트에 있는 conf.d/
폴더에서 redpanda.d/conf.yaml
파일을 편집하세요. 사용 가능한 모든 설정 옵션은 redpanda.d/conf.yaml.example 샘플을 참조하세요.
에이전트를 재시작하세요.
Datadog 에이전트에서 로그 수집은 기본적으로 비활성화되어 있습니다. 로그 수집은 에이전트 v6.0 이상에서 사용할 수 있습니다.
로그를 활성화하려면 datadog.yaml
파일에 다음을 추가합니다.
logs_enabled: true
dd-agent
사용자가 systemd-journal
그룹의 멤버인지 확인하고, 그렇지 않은 경우 루트 권한으로 다음 명령을 실행합니다.
usermod -a -G systemd-journal dd-agent
redpanda.d/conf.yaml
파일에 다음을 추가하여 Redpanda 로그 수집을 시작합니다.
logs:
- type: journald
source: redpanda
컨테이너화된 환경에서 자동탐지는 기본적으로 Redpanda 점검이 Datadog 에이전트 이미지에 통합된 후 설정됩니다.
메트릭은 Datadog 서버에 자동 수집됩니다. 자세한 내용은 자동탐지 통합 템플릿을 참조하세요.
Datadog 에이전트에서 로그 수집은 기본적으로 비활성화되어 있습니다. 로그 수집은 에이전트 v6.0 이상에서 사용할 수 있습니다.
로그 수집을 활성화하려면 쿠버네티스 로그 수집을 참조하세요.
파라미터 | 값 |
---|---|
<LOG_CONFIG> | {"source": "redpanda", "service": "redpanda_cluster"} |
에이전트 상태 하위 명령을 실행하고 점검 섹션에서 redpanda
를 찾습니다.
redpanda.application.build (gauge) | Redpanda build information |
redpanda.application.uptime (gauge) | Redpanda uptime in seconds Shown as second |
redpanda.controller.log_limit_requests_available (gauge) | Controller log rate limiting. Available rps for group Shown as request |
redpanda.controller.log_limit_requests_dropped (count) | Controller log rate limiting. Amount of requests that are dropped due to exceeding limit in group Shown as request |
redpanda.partitions.moving_from_node (gauge) | Amount of partitions that are moving from node |
redpanda.partitions.moving_to_node (gauge) | Amount of partitions that are moving to node |
redpanda.partitions.node_cancelling_movements (gauge) | Amount of cancelling partition movements for node |
redpanda.reactor.cpu_busy_seconds (gauge) | Total CPU busy time in seconds Shown as second |
redpanda.io_queue.total_read_ops (count) | Total read operations passed in the queue Shown as operation |
redpanda.io_queue.total_write_ops (count) | Total write operations passed in the queue Shown as operation |
redpanda.kafka.group_offset (gauge) | Consumer group committed offset |
redpanda.kafka.group_count (gauge) | Number of consumers in a group |
redpanda.kafka.group_topic_count (gauge) | Number of topics in a group |
redpanda.cluster.partitions (gauge) | Configured number of partitions for the topic |
redpanda.cluster.replicas (gauge) | Configured number of replicas for the topic |
redpanda.kafka.request_latency_seconds (gauge) | Internal latency of kafka produce requests Shown as second |
redpanda.kafka.under_replicated_replicas (gauge) | Number of under replicated replicas (i.e. replicas that are live, but not at the latest offest) |
redpanda.memory.allocated_memory (gauge) | Allocated memory size in bytes Shown as byte |
redpanda.memory.available_memory_low_water_mark (gauge) | The low-water mark for available_memory from process start Shown as byte |
redpanda.memory.available_memory (gauge) | Total shard memory potentially available in bytes (free_memory plus reclaimable) Shown as byte |
redpanda.memory.free_memory (gauge) | Free memory size in bytes Shown as byte |
redpanda.node_status.rpcs_received (gauge) | Number of node status RPCs received by this node Shown as request |
redpanda.node_status.rpcs_sent (gauge) | Number of node status RPCs sent by this node Shown as request |
redpanda.node_status.rpcs_timed_out (gauge) | Number of timed out node status RPCs from this node Shown as request |
redpanda.raft.leadership_changes (count) | Number of leadership changes across all partitions of a given topic |
redpanda.raft.recovery_bandwidth (gauge) | Bandwidth available for partition movement. bytes/sec |
redpanda.pandaproxy.request_errors (count) | Total number of rest_proxy server errors Shown as error |
redpanda.pandaproxy.request_latency (gauge) | Internal latency of request for rest_proxy Shown as millisecond |
redpanda.rpc.active_connections (gauge) | Count of currently active connections Shown as connection |
redpanda.rpc.request_errors (count) | Number of rpc errors Shown as error |
redpanda.rpc.request_latency_seconds (gauge) | RPC latency Shown as second |
redpanda.scheduler.runtime_seconds (count) | Accumulated runtime of task queue associated with this scheduling group Shown as second |
redpanda.schema_registry.errors (count) | Total number of schema_registry server errors Shown as error |
redpanda.schema_registry_latency_seconds (gauge) | Internal latency of request for schema_registry Shown as second |
redpanda.storage.disk_free_bytes (count) | Disk storage bytes free. Shown as byte |
redpanda.storage.disk_free_space_alert (gauge) | Status of low storage space alert. 0-OK, 1-Low Space 2-Degraded |
redpanda.storage.disk_total_bytes (count) | Total size of attached storage, in bytes. Shown as byte |
redpanda.cloud.client_backoff (count) | Total number of requests that backed off |
redpanda.cloud.client_download_backoff (count) | Total number of download requests that backed off |
redpanda.cloud.client_downloads (count) | Total number of requests that downloaded an object from cloud storage |
redpanda.cloud.client_not_found (count) | Total number of requests for which the object was not found |
redpanda.cloud.client_upload_backoff (count) | Total number of upload requests that backed off |
redpanda.cloud.client_uploads (count) | Total number of requests that uploaded an object to cloud storage |
redpanda.cloud.storage.active_segments (gauge) | Number of remote log segments currently hydrated for read |
redpanda.cloud.storage.cache_op_hit (count) | Number of get requests for objects that are already in cache. |
redpanda.cloud.storage.op_in_progress_files (gauge) | Number of files that are being put to cache. |
redpanda.cloud.storage.cache_op_miss (count) | Number of get requests that are not satisfied from the cache. |
redpanda.cloud.storage.op_put (count) | Number of objects written into cache. Shown as operation |
redpanda.cloud.storage.cache_space_files (gauge) | Number of objects in cache. |
redpanda.cloud.storage.cache_space_size_bytes (gauge) | Sum of size of cached objects. Shown as byte |
redpanda.cloud.storage.deleted_segments (count) | Number of segments that have been deleted from S3 for the topic. This may grow due to retention or non compacted segments being replaced with their compacted equivalent. |
redpanda.cloud.storage.errors (count) | Number of transmit errors Shown as error |
redpanda.cloud.storage.housekeeping.drains (gauge) | Number of times upload housekeeping queue was drained |
redpanda.cloud.storage.housekeeping.jobs_completed (count) | Number of executed housekeeping jobs |
redpanda.cloud.storage.housekeeping.jobs_failed (count) | Number of failed housekeeping jobs Shown as error |
redpanda.cloud.storage.housekeeping.jobs_skipped (count) | Number of skipped housekeeping jobs |
redpanda.cloud.storage.housekeeping.pauses (gauge) | Number of times upload housekeeping was paused |
redpanda.cloud.storage.housekeeping.resumes (gauge) | Number of times upload housekeeping was resumed |
redpanda.cloud.storage.housekeeping.rounds (count) | Number of upload housekeeping rounds |
redpanda.cloud.storage.jobs.cloud_segment_reuploads (gauge) | Number of segment reuploads from cloud storage sources (cloud storage cache or direct download from cloud storage) |
redpanda.cloud.storage.jobs.local_segment_reuploads (gauge) | Number of segment reuploads from local data directory |
redpanda.cloud.storage.jobs.manifest_reuploads (gauge) | Number of manifest reuploads performed by all housekeeping jobs |
redpanda.cloud.storage.jobs.metadata_syncs (gauge) | Number of archival configuration updates performed by all housekeeping jobs |
redpanda.cloud.storage.jobs.segment_deletions (gauge) | Number of segments deleted by all housekeeping jobs |
redpanda.cloud.storage.readers (gauge) | Total number of segments pending deletion from the cloud for the topic |
redpanda.cloud.storage.segments (gauge) | Total number of uploaded bytes for the topic |
redpanda.cloud.storage.segments_pending_deletion (gauge) | Number of read cursors for hydrated remote log segments |
redpanda.cloud.storage.uploaded_bytes (count) | Total number of accounted segments in the cloud for the topic Shown as byte |
redpanda.cluster.brokers (gauge) | Number of configured brokers in the cluster |
redpanda.cluster.controller_log_limit_requests_dropped (count) | Controller log rate limiting. Amount of requests that are dropped due to exceeding limit in group |
redpanda.cluster.partition_num_with_broken_rack_constraint (gauge) | Number of partitions that don't satisfy the rack awareness constraint |
redpanda.cluster.topics (gauge) | Number of topics in the cluster |
redpanda.cluster.unavailable_partitions (gauge) | Number of partitions that lack quorum among replicants |
redpanda.kafka.partition_committed_offset (gauge) | Latest committed offset for the partition (i.e. the offset of the last message safely persisted on most replicas) |
redpanda.kafka.partitions (gauge) | Configured number of partitions for the topic |
redpanda.kafka.replicas (gauge) | Configured number of replicas for the topic |
redpanda.kafka.request_bytes (count) | Total number of bytes produced per topic Shown as byte |
Redpanda 통합에는 이벤트가 포함되어 있지 않습니다.
redpanda.openmetrics.health
Returns CRITICAL
if the check cannot access the metrics endpoint. Returns OK
otherwise.
Statuses: ok, critical
도움이 필요하신가요? Datadog 지원팀에 문의하세요.