Azure IoT Edge

Supported OS Windows Mac OS

통합 버전4.2.0

개요

Azure IoT Edge는 표준 컨테이너를 사용하여 Internet of Things (IoT) Edge 디바이스에서 실행하기 위해 클라우드 워크로드를 배포하는 완전관리형 서비스입니다.

Datadog-Azure IoT Edge 통합을 사용하여 IoT Edge 디바이스에서 메트릭 및 상태를 수집합니다.

참고: 통합을 위해 IoT Edge 런타임 버전 1.0.10 이상이 필요합니다.

설정

아래 지침에 따라 디바이스 호스트에서 실행되는 IoT Edge 디바이스에 대해 이 검사를 설치하고 설정합니다.

설치

Azure IoT Edge 검사는 Datadog Agent 패키지에 포함되어 있습니다.

디바이스에 추가 설치가 필요하지 않습니다.

설정

Agent가 커스텀 모듈로 실행되도록 IoT Edge 디바이스를 설정합니다. Azure IoT Edge용 커스텀 모듈 설치 및 작업에 대한 자세한 내용은 Azure IoT Edge 모듈 배포에 대한 Microsoft 설명서를 참조하세요.

IoT Edge 메트릭 수집을 위해 IoT Edge 디바이스, 런타임 모듈 및 Datadog Agen를 설정하려면 아래 단계를 따르세요.

  1. Edge Agent 런타임 모듈을 다음과 같이 설정합니다.

    • 이미지 버전은 1.0.10 이상이어야 합니다.

    • “Create Options"에서 다음 Labels를 추가하고 com.datadoghq.ad.instances 레이블을 적절하게 편집합니다. 가능한 모든 설정 옵션은 샘플 azure_iot_edge.d/conf.yaml를 참조하세요. 레이블 기반 통합 설정에 대한 자세한 내용은 Docker 통합 오토디스커버리 문서를 참조하세요.

      "Labels": {
          "com.datadoghq.ad.check_names": "[\"azure_iot_edge\"]",
          "com.datadoghq.ad.init_configs": "[{}]",
          "com.datadoghq.ad.instances": "[{\"edge_hub_prometheus_url\": \"http://edgeHub:9600/metrics\", \"edge_agent_prometheus_url\": \"http://edgeAgent:9600/metrics\"}]"
      }
      
  2. Edge Hub 런타임 모듈을 다음과 같이 설정합니다.

    • 이미지 버전은 1.0.10 이상이어야 합니다.
  3. Datadog Agent를 커스텀 모듈로 설치 및 설정합니다.

    • 모듈 이름을 지정합니다 (예: datadog-agent).

    • Agent 이미지 URI를 지정합니다 (예: datadog/agent:7).

    • “Environment Variables"에서 DD_API_KEY를 설정합니다. 여기에서 추가 Agent를 설정할 수도 있습니다(Agent 환경 변수 참조).

    • “Container Create Options"에서 디바이스 OS에 따라 다음 설정을 입력합니다. 참고: NetworkId는 디바이스 config.yaml 파일에 설정된 네트워크 이름과 일치해야 합니다.

      • Linux:
        {
            "HostConfig": {
                "NetworkMode": "default",
                "Env": ["NetworkId=azure-iot-edge"],
                "Binds": ["/var/run/docker.sock:/var/run/docker.sock"]
            }
        }
        
      • Windows:
        {
            "HostConfig": {
                "NetworkMode": "default",
                "Env": ["NetworkId=nat"],
                "Binds": ["//./pipe/iotedge_moby_engine:/./pipe/docker_engine"]
            }
        }
        
    • Datadog Agent 커스텀 모듈을 저장합니다.

  4. 디바이스 설정에 대한 변경 사항을 저장하고 배포합니다.

로그 수집

  1. 로그 수집은 Datadog Agent에서 기본적으로 비활성화되어 있습니다. Datadog Agent 커스텀 모듈을 설정하여 활성화합니다.

    • “Environment Variables"에서 DD_LOGS_ENABLED 환경 변수를 설정합니다.

      DD_LOGS_ENABLED: true
      
  2. Edge AgentEdge Hub 모듈을 설정합니다. “Create Options"에서 다음 레이블을 추가합니다.

    "Labels": {
        "com.datadoghq.ad.logs": "[{\"source\": \"azure.iot_edge\", \"service\": \"<SERVICE>\"}]",
        "...": "..."
    }
    

    환경에 따라 service를 변경합니다.

    로그를 수집하려는 커스텀 모듈에 대해 이 작업을 반복합니다.

  3. 디바이스 설정에 대한 변경 사항을 저장하고 배포합니다.

검증

Agent가 디바이스에 배포되면 Agent의 상태 하위 명령을 실행하고 Checks 섹션에서 azure_iot_edge를 찾습니다.

수집한 데이터

메트릭

azure.iot_edge.edge_agent.available_disk_space_bytes
(gauge)
Amount of space left on the disk `disk_name.
Shown as byte
azure.iot_edge.edge_agent.command_latency_seconds.count
(gauge)
Count of how long it took for Docker to execute the given command. Possible commands are: create, update, remove, start, stop, restart.
azure.iot_edge.edge_agent.command_latency_seconds.quantile
(gauge)
Quantile of how long it took for Docker to execute the given command. Possible commands are: create, update, remove, start, stop, restart.
Shown as second
azure.iot_edge.edge_agent.command_latency_seconds.sum
(gauge)
Sum of how long it took for Docker to execute the given command. Possible commands are: create, update, remove, start, stop, restart.
Shown as second
azure.iot_edge.edge_agent.created_pids_total
(gauge)
Total number of processes the module module_name has created.
azure.iot_edge.edge_agent.deployment_time_seconds.count
(gauge)
Count of amount of time it took to complete a new deployment after receiving a change.
azure.iot_edge.edge_agent.deployment_time_seconds.quantile
(gauge)
Quantile of amount of time it took to complete a new deployment after receiving a change.
Shown as second
azure.iot_edge.edge_agent.deployment_time_seconds.sum
(gauge)
Sum of amount of time it took to complete a new deployment after receiving a change.
Shown as second
azure.iot_edge.edge_agent.direct_method_invocations_count
(count)
Total number of times a built-in Edge Agent direct method is called, such as Ping or Restart.
azure.iot_edge.edge_agent.host_uptime_seconds
(gauge)
How long the host has been running.
Shown as second
azure.iot_edge.edge_agent.iotedged_uptime_seconds
(gauge)
How long iotedged has been running.
Shown as second
azure.iot_edge.edge_agent.iothub_syncs_total
(count)
Total number of times the Edge Agent attempted to sync its twin with IoT Hub, both successful and unsuccessful. Includes both Edge Agent requesting a twin, and IoT Hub notifying of a twin update.
azure.iot_edge.edge_agent.module_start_total
(count)
Number of times the Edge Agent asked Docker to start the module module_name.
azure.iot_edge.edge_agent.module_stop_total
(count)
Number of times the Edge Agent asked Docker to stop the module module_name.
azure.iot_edge.edge_agent.total_disk_read_bytes
(count)
Total amount of bytes read from the disk by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_disk_space_bytes
(gauge)
Size of the disk `disk_name.
Shown as byte
azure.iot_edge.edge_agent.total_disk_write_bytes
(count)
Total amount of bytes written to the disk by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_memory_bytes
(gauge)
Total amount of RAM available to module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_network_in_bytes
(count)
Total amount of bytes received from the network by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_network_out_bytes
(count)
Total amount of bytes sent to the network by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_time_expected_running_seconds
(gauge)
The amount of time the module module_name was specified in the deployment.
azure.iot_edge.edge_agent.total_time_running_correctly_seconds
(gauge)
The amount of time the module module_name was specified in the deployment and was in the running state.
azure.iot_edge.edge_agent.unsuccessful_iothub_syncs_total
(count)
Total number of times the Edge Agent failed to sync its twin with IoT Hub.
azure.iot_edge.edge_agent.used_cpu_percent.count
(gauge)
Count of percent of CPU used by all processes in module module_name.
azure.iot_edge.edge_agent.used_cpu_percent.quantile
(gauge)
Quantile of percent of CPU used by all processes in module module_name.
Shown as percent
azure.iot_edge.edge_agent.used_cpu_percent.sum
(gauge)
Sum of percent of CPU used by all processes in module module_name.
Shown as percent
azure.iot_edge.edge_agent.used_memory_bytes
(gauge)
Amount of RAM used by all processes in module module_name.
Shown as byte
azure.iot_edge.edge_hub.client_connect_failed_total
(count)
Total number of times clients failed to connect to Edge Hub.
azure.iot_edge.edge_hub.direct_method_duration_seconds.count
(gauge)
Count of time taken to resolve a direct message.
azure.iot_edge.edge_hub.direct_method_duration_seconds.quantile
(gauge)
Quantile of time taken to resolve a direct message.
Shown as second
azure.iot_edge.edge_hub.direct_method_duration_seconds.sum
(gauge)
Sum of time taken to resolve a direct message.
Shown as second
azure.iot_edge.edge_hub.direct_methods_total
(count)
Total number of direct messages sent.
azure.iot_edge.edge_hub.gettwin_duration_seconds.count
(gauge)
Count of time taken for get twin operations.
azure.iot_edge.edge_hub.gettwin_duration_seconds.quantile
(gauge)
Quantile of time taken for get twin operations.
Shown as second
azure.iot_edge.edge_hub.gettwin_duration_seconds.sum
(gauge)
Sum of time taken for get twin operations.
Shown as second
azure.iot_edge.edge_hub.gettwin_total
(count)
Total number of GetTwin calls.
azure.iot_edge.edge_hub.message_process_duration_seconds.count
(gauge)
Count of time taken to process a message from the queue.
azure.iot_edge.edge_hub.message_process_duration_seconds.quantile
(gauge)
Quantile of time taken to process a message from the queue.
Shown as second
azure.iot_edge.edge_hub.message_process_duration_seconds.sum
(gauge)
Sum of time taken to process a message from the queue.
Shown as second
azure.iot_edge.edge_hub.message_send_duration_seconds.count
(gauge)
Count of time taken to send a message.
azure.iot_edge.edge_hub.message_send_duration_seconds.quantile
(gauge)
Quantile of time taken to send a message.
Shown as second
azure.iot_edge.edge_hub.message_send_duration_seconds.sum
(gauge)
Sum of time taken to send a message.
Shown as second
azure.iot_edge.edge_hub.message_size_bytes.count
(gauge)
Count of message size from clients.
azure.iot_edge.edge_hub.message_size_bytes.quantile
(gauge)
Quantile of message size from clients.
Shown as byte
azure.iot_edge.edge_hub.message_size_bytes.sum
(gauge)
Sum of message size from clients.
Shown as byte
azure.iot_edge.edge_hub.messages_dropped_total
(count)
Total number of messages removed because of reason.
azure.iot_edge.edge_hub.messages_received_total
(count)
Total number of messages received from clients.
azure.iot_edge.edge_hub.messages_sent_total
(count)
Total number of messages sent to clients of upstream.
azure.iot_edge.edge_hub.messages_unack_total
(count)
Total number of messages unack because of storage failure.
azure.iot_edge.edge_hub.offline_count_total
(count)
Total number of times Edge Hub went offline.
azure.iot_edge.edge_hub.offline_duration_seconds.count
(gauge)
Count of time Edge Hub was offline.
azure.iot_edge.edge_hub.offline_duration_seconds.quantile
(gauge)
Quantile of time Edge Hub was offline.
Shown as second
azure.iot_edge.edge_hub.offline_duration_seconds.sum
(gauge)
Sum of time Edge Hub was offline.
Shown as second
azure.iot_edge.edge_hub.operation_retry_total
(count)
Total number of times Edge operations were retried.
azure.iot_edge.edge_hub.queue_length
(gauge)
Current length of Edge Hub's queue for a given priority.
azure.iot_edge.edge_hub.reported_properties_total
(count)
Total reported property updates calls.
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.count
(gauge)
Count of time taken to update reported properties.
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.quantile
(gauge)
Quantile of time taken to update reported properties.
Shown as second
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.sum
(gauge)
Sum of time taken to update reported properties.
Shown as second

이벤트

Azure IoT Edge는 이벤트를 포함하지 않습니다.

서비스 점검

azure.iot_edge.edge_agent.prometheus.health
Returns CRITICAL if the Agent is unable to reach the Edge Agent metrics Prometheus endpoint. Returns OK otherwise.
Statuses: ok, critical

azure.iot_edge.edge_hub.prometheus.health
Returns CRITICAL if the Agent is unable to reach the Edge Hub metrics Prometheus endpoint. Returns OK otherwise.
Statuses: ok, critical

트러블슈팅

도움이 필요하신가요? Datadog 고객 지원팀에 문의해주세요.

참고 자료

PREVIEWING: esther/docs-8632-slo-blog-links