Azure IoT Edge

Supported OS Linux Windows Mac OS

Integration version4.2.1

Overview

Azure IoT Edge is a fully managed service to deploy Cloud workloads to run on Internet of Things (IoT) Edge devices using standard containers.

Use the Datadog-Azure IoT Edge integration to collect metrics and health status from IoT Edge devices.

Note: This integration requires IoT Edge runtime version 1.0.10 or above.

Setup

Follow the instructions below to install and configure this check for an IoT Edge device running on a device host.

Installation

The Azure IoT Edge check is included in the Datadog Agent package.

No additional installation is needed on your device.

Configuration

Configure the IoT Edge device so that the Agent runs as a custom module. Follow the Microsoft documentation on deploying Azure IoT Edge modules for information on installing and working with custom modules for Azure IoT Edge.

Follow the steps below to configure the IoT Edge device, runtime modules, and the Datadog Agent to start collecting IoT Edge metrics.

  1. Configure the Edge Agent runtime module as follows:

    • Image version must be 1.0.10 or above.

    • Under “Create Options”, add the following Labels. Edit the com.datadoghq.ad.instances label as appropriate. See the sample azure_iot_edge.d/conf.yaml for all available configuration options. See the documentation on Docker Integrations Autodiscovery for more information on labels-based integration configuration.

      "Labels": {
          "com.datadoghq.ad.check_names": "[\"azure_iot_edge\"]",
          "com.datadoghq.ad.init_configs": "[{}]",
          "com.datadoghq.ad.instances": "[{\"edge_hub_prometheus_url\": \"http://edgeHub:9600/metrics\", \"edge_agent_prometheus_url\": \"http://edgeAgent:9600/metrics\"}]"
      }
      
  2. Configure the Edge Hub runtime module as follows:

    • Image version must be 1.0.10 or above.
  3. Install and configure the Datadog Agent as a custom module:

    • Set the module name. For example: datadog-agent.

    • Set the Agent image URI. For example: datadog/agent:7.

    • Under “Environment Variables”, configure your DD_API_KEY. You may also set extra Agent configuration here (see Agent Environment Variables).

    • Under “Container Create Options”, enter the following configuration based on your device OS. Note: NetworkId must correspond to the network name set in the device config.yaml file.

      • Linux:
        {
            "HostConfig": {
                "NetworkMode": "default",
                "Env": ["NetworkId=azure-iot-edge"],
                "Binds": ["/var/run/docker.sock:/var/run/docker.sock"]
            }
        }
        
      • Windows:
        {
            "HostConfig": {
                "NetworkMode": "default",
                "Env": ["NetworkId=nat"],
                "Binds": ["//./pipe/iotedge_moby_engine:/./pipe/docker_engine"]
            }
        }
        
    • Save the Datadog Agent custom module.

  4. Save and deploy changes to your device configuration.

Log collection

  1. Collecting logs is disabled by default in the Datadog Agent, enable it by configuring your Datadog Agent custom module:

    • Under “Environment Variables”, set the DD_LOGS_ENABLED environment variable:

      DD_LOGS_ENABLED: true
      
  2. Configure the Edge Agent and Edge Hub modules: under “Create Options”, add the following label:

    "Labels": {
        "com.datadoghq.ad.logs": "[{\"source\": \"azure.iot_edge\", \"service\": \"<SERVICE>\"}]",
        "...": "..."
    }
    

    Change the service based on your environment.

    Repeat this operation for any custom modules you’d like to collect logs for.

  3. Save and deploy changes to your device configuration.

Validation

Once the Agent has been deployed to the device, run the Agent’s status subcommand and look for azure_iot_edge under the Checks section.

Data Collected

Metrics

azure.iot_edge.edge_agent.available_disk_space_bytes
(gauge)
Amount of space left on the disk `disk_name.
Shown as byte
azure.iot_edge.edge_agent.command_latency_seconds.count
(gauge)
Count of how long it took for Docker to execute the given command. Possible commands are: create, update, remove, start, stop, restart.
azure.iot_edge.edge_agent.command_latency_seconds.quantile
(gauge)
Quantile of how long it took for Docker to execute the given command. Possible commands are: create, update, remove, start, stop, restart.
Shown as second
azure.iot_edge.edge_agent.command_latency_seconds.sum
(gauge)
Sum of how long it took for Docker to execute the given command. Possible commands are: create, update, remove, start, stop, restart.
Shown as second
azure.iot_edge.edge_agent.created_pids_total
(gauge)
Total number of processes the module module_name has created.
azure.iot_edge.edge_agent.deployment_time_seconds.count
(gauge)
Count of amount of time it took to complete a new deployment after receiving a change.
azure.iot_edge.edge_agent.deployment_time_seconds.quantile
(gauge)
Quantile of amount of time it took to complete a new deployment after receiving a change.
Shown as second
azure.iot_edge.edge_agent.deployment_time_seconds.sum
(gauge)
Sum of amount of time it took to complete a new deployment after receiving a change.
Shown as second
azure.iot_edge.edge_agent.direct_method_invocations_count
(count)
Total number of times a built-in Edge Agent direct method is called, such as Ping or Restart.
azure.iot_edge.edge_agent.host_uptime_seconds
(gauge)
How long the host has been running.
Shown as second
azure.iot_edge.edge_agent.iotedged_uptime_seconds
(gauge)
How long iotedged has been running.
Shown as second
azure.iot_edge.edge_agent.iothub_syncs_total
(count)
Total number of times the Edge Agent attempted to sync its twin with IoT Hub, both successful and unsuccessful. Includes both Edge Agent requesting a twin, and IoT Hub notifying of a twin update.
azure.iot_edge.edge_agent.module_start_total
(count)
Number of times the Edge Agent asked Docker to start the module module_name.
azure.iot_edge.edge_agent.module_stop_total
(count)
Number of times the Edge Agent asked Docker to stop the module module_name.
azure.iot_edge.edge_agent.total_disk_read_bytes
(count)
Total amount of bytes read from the disk by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_disk_space_bytes
(gauge)
Size of the disk `disk_name.
Shown as byte
azure.iot_edge.edge_agent.total_disk_write_bytes
(count)
Total amount of bytes written to the disk by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_memory_bytes
(gauge)
Total amount of RAM available to module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_network_in_bytes
(count)
Total amount of bytes received from the network by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_network_out_bytes
(count)
Total amount of bytes sent to the network by module module_name.
Shown as byte
azure.iot_edge.edge_agent.total_time_expected_running_seconds
(gauge)
The amount of time the module module_name was specified in the deployment.
azure.iot_edge.edge_agent.total_time_running_correctly_seconds
(gauge)
The amount of time the module module_name was specified in the deployment and was in the running state.
azure.iot_edge.edge_agent.unsuccessful_iothub_syncs_total
(count)
Total number of times the Edge Agent failed to sync its twin with IoT Hub.
azure.iot_edge.edge_agent.used_cpu_percent.count
(gauge)
Count of percent of CPU used by all processes in module module_name.
azure.iot_edge.edge_agent.used_cpu_percent.quantile
(gauge)
Quantile of percent of CPU used by all processes in module module_name.
Shown as percent
azure.iot_edge.edge_agent.used_cpu_percent.sum
(gauge)
Sum of percent of CPU used by all processes in module module_name.
Shown as percent
azure.iot_edge.edge_agent.used_memory_bytes
(gauge)
Amount of RAM used by all processes in module module_name.
Shown as byte
azure.iot_edge.edge_hub.client_connect_failed_total
(count)
Total number of times clients failed to connect to Edge Hub.
azure.iot_edge.edge_hub.direct_method_duration_seconds.count
(gauge)
Count of time taken to resolve a direct message.
azure.iot_edge.edge_hub.direct_method_duration_seconds.quantile
(gauge)
Quantile of time taken to resolve a direct message.
Shown as second
azure.iot_edge.edge_hub.direct_method_duration_seconds.sum
(gauge)
Sum of time taken to resolve a direct message.
Shown as second
azure.iot_edge.edge_hub.direct_methods_total
(count)
Total number of direct messages sent.
azure.iot_edge.edge_hub.gettwin_duration_seconds.count
(gauge)
Count of time taken for get twin operations.
azure.iot_edge.edge_hub.gettwin_duration_seconds.quantile
(gauge)
Quantile of time taken for get twin operations.
Shown as second
azure.iot_edge.edge_hub.gettwin_duration_seconds.sum
(gauge)
Sum of time taken for get twin operations.
Shown as second
azure.iot_edge.edge_hub.gettwin_total
(count)
Total number of GetTwin calls.
azure.iot_edge.edge_hub.message_process_duration_seconds.count
(gauge)
Count of time taken to process a message from the queue.
azure.iot_edge.edge_hub.message_process_duration_seconds.quantile
(gauge)
Quantile of time taken to process a message from the queue.
Shown as second
azure.iot_edge.edge_hub.message_process_duration_seconds.sum
(gauge)
Sum of time taken to process a message from the queue.
Shown as second
azure.iot_edge.edge_hub.message_send_duration_seconds.count
(gauge)
Count of time taken to send a message.
azure.iot_edge.edge_hub.message_send_duration_seconds.quantile
(gauge)
Quantile of time taken to send a message.
Shown as second
azure.iot_edge.edge_hub.message_send_duration_seconds.sum
(gauge)
Sum of time taken to send a message.
Shown as second
azure.iot_edge.edge_hub.message_size_bytes.count
(gauge)
Count of message size from clients.
azure.iot_edge.edge_hub.message_size_bytes.quantile
(gauge)
Quantile of message size from clients.
Shown as byte
azure.iot_edge.edge_hub.message_size_bytes.sum
(gauge)
Sum of message size from clients.
Shown as byte
azure.iot_edge.edge_hub.messages_dropped_total
(count)
Total number of messages removed because of reason.
azure.iot_edge.edge_hub.messages_received_total
(count)
Total number of messages received from clients.
azure.iot_edge.edge_hub.messages_sent_total
(count)
Total number of messages sent to clients of upstream.
azure.iot_edge.edge_hub.messages_unack_total
(count)
Total number of messages unack because of storage failure.
azure.iot_edge.edge_hub.offline_count_total
(count)
Total number of times Edge Hub went offline.
azure.iot_edge.edge_hub.offline_duration_seconds.count
(gauge)
Count of time Edge Hub was offline.
azure.iot_edge.edge_hub.offline_duration_seconds.quantile
(gauge)
Quantile of time Edge Hub was offline.
Shown as second
azure.iot_edge.edge_hub.offline_duration_seconds.sum
(gauge)
Sum of time Edge Hub was offline.
Shown as second
azure.iot_edge.edge_hub.operation_retry_total
(count)
Total number of times Edge operations were retried.
azure.iot_edge.edge_hub.queue_length
(gauge)
Current length of Edge Hub's queue for a given priority.
azure.iot_edge.edge_hub.reported_properties_total
(count)
Total reported property updates calls.
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.count
(gauge)
Count of time taken to update reported properties.
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.quantile
(gauge)
Quantile of time taken to update reported properties.
Shown as second
azure.iot_edge.edge_hub.reported_properties_update_duration_seconds.sum
(gauge)
Sum of time taken to update reported properties.
Shown as second

Events

Azure IoT Edge does not include any events.

Service Checks

azure.iot_edge.edge_agent.prometheus.health
Returns CRITICAL if the Agent is unable to reach the Edge Agent metrics Prometheus endpoint. Returns OK otherwise.
Statuses: ok, critical

azure.iot_edge.edge_hub.prometheus.health
Returns CRITICAL if the Agent is unable to reach the Edge Hub metrics Prometheus endpoint. Returns OK otherwise.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.

Further Reading

PREVIEWING: may/unit-testing