Google Cloud Dataflow

문서 > 통합 > Google Cloud Dataflow

개요

Google Cloud Dataflow는 스트림(실시간) 및 배치(기록) 모드에서 동일한 안정성과 표현 능력으로 데이터를 변환 및 보강할 수 있는 완전관리형 서비스입니다.

Datadog Google Cloud 통합을 사용하여 Google Cloud Dataflow에서 메트릭을 수집합니다.

설정

메트릭 수집

설치

아직 설치하지 않았다면 먼저 Google 클라우드 플랫폼 통합을 설정합니다. 그 외 다른 설치가 필요하지 않습니다.

로그 수집

Google Cloud Dataflow 로그는 Google Cloud Logging으로 수집하여 클라우드 Pub/Sub 토픽을 통해 데이터 플로우 작업으로 전송됩니다. 아직 설정하지 않았다면 Datadog 데이터 플로우 템플릿으로 로깅을 설정하세요.

해당 작업이 완료되면 Google Cloud Logging에서 Google Cloud Dataflow 로그를 다음 Pub/Sub 주제로 내보냅니다.

Google Cloud Logging 페이지로 이동해 Google Cloud Dataflow 로그를 필터링하세요.
Create Sink를 클릭하고 그에 따라 싱크 이름을 지정합니다.
“Cloud Pub/Sub"를 대상으로 선택하고 해당 목적으로 생성된 Pub/Sub 주제를 선택합니다. 참고: Pub/Sub 주제는 다른 프로젝트에 있을 수 있습니다.
Create를 클릭하고 확인 메시지가 나타날 때까지 기다립니다.

수집한 데이터

메트릭

gcp.dataflow.job.billable_shuffle_data_processed (gauge)	The billable bytes of shuffle data processed by this Dataflow job. Shown as byte
gcp.dataflow.job.current_num_vcpus (gauge)	The number of vCPUs currently being used by this Dataflow job. Shown as cpu
gcp.dataflow.job.current_shuffle_slots (gauge)	The current shuffle slots used by this Dataflow job.
gcp.dataflow.job.data_watermark_age (gauge)	The age (time since event timestamp) of the most recent item of data that has been fully processed by the pipeline. Shown as second
gcp.dataflow.job.elapsed_time (gauge)	Duration that the current run of this pipeline has been in the Running state so far, in seconds. When a run completes, this stays at the duration of that run until the next run starts. Shown as second
gcp.dataflow.job.element_count (count)	Number of elements added to the pcollection so far. Shown as item
gcp.dataflow.job.estimated_byte_count (count)	An estimated number of bytes added to the pcollection so far. Shown as byte
gcp.dataflow.job.is_failed (gauge)	Has this job failed.
gcp.dataflow.job.system_lag (gauge)	The current maximum duration that an item of data has been awaiting processing, in seconds. Shown as second
gcp.dataflow.job.total_memory_usage_time (gauge)	The total GB seconds of memory allocated to this Dataflow job. Shown as gibibyte
gcp.dataflow.job.total_pd_usage_time (gauge)	The total GB seconds for all persistent disk used by all workers associated with this Dataflow job. Shown as gibibyte
gcp.dataflow.job.total_shuffle_data_processed (gauge)	The total bytes of shuffle data processed by this Dataflow job. Shown as byte
gcp.dataflow.job.total_streaming_data_processed (gauge)	The total bytes of streaming data processed by this Dataflow job. Shown as byte
gcp.dataflow.job.total_vcpu_time (gauge)	The total vCPU seconds used by this Dataflow job.
gcp.dataflow.job.user_counter (gauge)	A user-defined counter metric.
gcp.dataflow.quota.region_endpoint_shuffle_slot.exceeded (count)	Number of attempts to exceed the limit on quota metric dataflow.googleapis.com/regionendpointshuffle_slot.
gcp.dataflow.quota.region_endpoint_shuffle_slot.limit (gauge)	Current limit on quota metric dataflow.googleapis.com/regionendpointshuffle_slot.
gcp.dataflow.quota.region_endpoint_shuffle_slot.usage (gauge)	Current usage on quota metric dataflow.googleapis.com/regionendpointshuffle_slot.