Overview

Amazon ECS is a scalable, high-performance container orchestration service that supports Docker containers. With the Datadog Agent, you can monitor ECS containers and tasks on every EC2 instance in your cluster.

If you want to monitor ECS on Fargate, see Amazon ECS on AWS Fargate.

Setup

To monitor your ECS containers and tasks, deploy the Datadog Agent as a container once on each EC2 instance in your ECS cluster. You can do this by creating a task definition for the Datadog Agent container and deploying it as a daemon service. Each Datadog Agent container then monitors the other containers on its respective EC2 instance.

The following instructions assume that you have configured an EC2 cluster. See the Amazon ECS documentation for creating a cluster.

  1. Create and add an ECS task definition
  2. Schedule the Datadog Agent as a daemon service
  3. (Optional) Set up additional Datadog Agent features

Note: Datadog’s Autodiscovery can be used in conjunction with ECS and Docker to automatically discover and monitor running tasks in your environment.

Create an ECS task definition

This ECS task definition launches the Datadog Agent container with the necessary configurations. When you need to modify the Agent configuration, update this task definition and redeploy the daemon service. You can configure this task definition by using the AWS Management Console, or with the AWS CLI.

The following sample is a minimal configuration for core infrastructure monitoring. However, additional Task Definition samples with various features enabled are provided in the Setup additional Agent features section if you want to use those instead.

Create and manage the task definition file

  1. For Linux containers, download datadog-agent-ecs.json.

    These files provide minimal configuration for core infrastructure monitoring. For more sample task definition files with various features enabled, see the Set up additional Agent features section on this page.
  2. Edit your base task definition file

    • Set the DD_API_KEY environment variable by replacing <YOUR_DATADOG_API_KEY> with the Datadog API key for your account. Alternatively, you can also supply the ARN of a secret stored in AWS Secrets Manager.

    • Set the DD_SITE environment variable to your Datadog site. Your site is:

      If DD_SITE is not set, it defaults to the US1 site, datadoghq.com.
    • Optionally, add a DD_TAGS environment variable to specify any additional tags.

  3. (Optional) To deploy on an ECS Anywhere cluster, add the following line to your ECS task definition:

    "requiresCompatibilities": ["EXTERNAL"]
    
  4. (Optional) To add an Agent health check, add the following line to your ECS task definition:

    "healthCheck": {
      "retries": 3,
      "command": ["CMD-SHELL","agent health"],
      "timeout": 5,
      "interval": 30,
      "startPeriod": 15
    }
    

Register the task definition

After you have created your task definition file, execute the following command to register the file in AWS.

aws ecs register-task-definition --cli-input-json file://<path to datadog-agent-ecs.json>

After you have your task definition file, use the AWS Console to register the file.

  1. Log in to your AWS Console and navigate to the Elastic Container Service section.
  2. Select Task Definitions in the navigation pane. On the Create new task definition menu, select Create new task definition with JSON.
  3. In the JSON editor box, paste the contents of your task definition file.
  4. Select Create.

Run the Agent as a daemon service

To have one Datadog Agent container running on each EC2 instance, run the Datadog Agent task definition as a daemon service.

Schedule a daemon service in AWS using Datadog’s ECS task

  1. Log in to the AWS Console and navigate to the ECS section. On the Clusters page, choose the cluster you run the Agent on.
  2. On your cluster’s Services tab, select Create.
  3. Under Deployment configuration, for Service type, select Daemon.
  4. You do not need to configure load balancing or autoscaling.
  5. Click Next Step, and then Create Service.

Set up additional Agent features

The task definition files provided in the previous section are minimal. These files deploy an Agent container with a base configuration to collect core metrics about the containers in your ECS cluster. The Agent can also run Agent integrations based on Docker Labels discovered on your containers.

For additional features:

APM

Consult the APM setup documentation and the sample datadog-agent-ecs-apm.json.

Log Management

Consult the Log collection documentation and the sample datadog-agent-ecs-logs.json

DogStatsD

If you’re using DogStatsD, edit your Datadog Agent’s container definition to add in host port mapping for 8125/udp and set the environment variable DD_DOGSTATSD_NON_LOCAL_TRAFFIC to true.:

{
 "containerDefinitions": [
  {
   "name": "datadog-agent",
   (...)
   "portMappings": [
     {
      "hostPort": 8125,
      "protocol": "udp",
      "containerPort": 8125
     }
   ],
   "environment" : [
     {
       "name": "DD_API_KEY",
       "value": "<YOUR_DATADOG_API_KEY>"
     },
     {
       "name": "DD_SITE",
       "value": "datadoghq.com"
     },
     {
       "name": "DD_DOGSTATSD_NON_LOCAL_TRAFFIC",
       "value": "true"
     }
   ]
  }
 ],
 (...)
}

This setup allows DogStatsD traffic to be routed from the application containers, through the host and host port, to the Datadog Agent container. However, the application container must use the host’s private IP address for this traffic. You can enable this by setting the environment variable DD_AGENT_HOST to the private IP address of the EC2 instance, which you can retrieve from the Instance Metadata Service (IMDS). Alternatively, you can set this in the code during initialization. The implementation for DogStatsD is the same as for APM. See Configure the Trace Agent endpoint for examples of setting the Agent endpoint.

Ensure that the security group settings on your EC2 instances do not publicly expose the ports for APM and DogStatsD.

Process collection

To collect Live Process information for all your containers and send it to Datadog, update your task definition with the DD_PROCESS_AGENT_ENABLED environment variable:

{
 "containerDefinitions": [
  {
   "name": "datadog-agent",
   (...)
   "environment" : [
     {
       "name": "DD_API_KEY",
       "value": "<YOUR_DATADOG_API_KEY>"
     },
     {
       "name": "DD_SITE",
       "value": "datadoghq.com"
     },
     {
       "name": "DD_PROCESS_AGENT_ENABLED",
       "value": "true"
     }
   ]
  }
 ],
 (...)
}

Network Performance Monitoring

This feature is only available for Linux.

Consult the sample datadog-agent-sysprobe-ecs.json file.

If you are using Amazon Linux 1 (AL1, formerly Amazon Linux AMI), consult datadog-agent-sysprobe-ecs1.json.

If you already have a task definition, update your file to include the following configuration:

{
  "containerDefinitions": [
    (...)
      "mountPoints": [
        (...)
        {
          "containerPath": "/sys/kernel/debug",
          "sourceVolume": "debug"
        },
        (...)
      ],
      "environment": [
        (...)
        {
          "name": "DD_SYSTEM_PROBE_NETWORK_ENABLED",
          "value": "true"
        }
      ],
      "linuxParameters": {
       "capabilities": {
         "add": [
           "SYS_ADMIN",
           "SYS_RESOURCE",
           "SYS_PTRACE",
           "NET_ADMIN",
           "NET_BROADCAST",
           "NET_RAW",
           "IPC_LOCK",
           "CHOWN"
         ]
       }
     },
  ],
  "requiresCompatibilities": [
   "EC2"
  ],
  "volumes": [
    (...)
    {
     "host": {
       "sourcePath": "/sys/kernel/debug"
     },
     "name": "debug"
    },
    (...)
  ],
  "family": "datadog-agent-task"
}

Network Path

Network Path for Datadog Network Performance Monitoring is in Preview. Reach out to your Datadog representative to sign up.
  1. To enable Network Path on your ECS clusters, enable the system-probe traceroute module by adding the following environment variable in your datadog-agent-sysprobe-ecs.json file:

       "environment": [
         (...)
         {
           "name": "DD_TRACEROUTE_ENABLED",
           "value": "true"
         }
       ],
    
  2. To monitor individual paths, follow the instructions here to set up additional Agent features:

    These files deploy an Agent container with a base configuration to collect core metrics about the containers in your ECS cluster. The Agent can also run Agent integrations based on Docker Labels discovered on your containers.

  3. To monitor network traffic paths and allow the Agent to automatically discover and monitor network paths based on actual network traffic, without requiring you to specify endpoints manually, add the following additional environment variables to your datadog-agent-sysprobe-ecs.json:

       "environment": [
         (...)
         {
           "name": "DD_NETWORK_PATH_CONNECTIONS_MONITORING_ENABLED",
           "value": "true"
         }
       ],
    
  4. Optionally, to configure number of workers (default is 4) adjust the following environment variable in your datadog-agent-sysprobe-ecs.json file:

       "environment": [
         (...)
         {
           "name": "DD_NETWORK_PATH_COLLECTOR_WORKERS",
           "value": "10"
         }
       ],
    

AWSVPC mode

For Agent v6.10+, awsvpc mode is supported for applicative containers, provided that security groups are set to allow the host instance’s security group to reach the applicative containers on relevant ports.

You can run the Agent in awsvpc mode, but Datadog does not recommend this because it may be difficult to retrieve the ENI IP to reach the Agent for DogStatsD metrics and APM traces. Instead, run the Agent in bridge mode with port mapping to allow easier retrieval of host IP through the metadata server.

FIPS proxy for Datadog for Government environments

This feature is only available for Linux.

To send data to the Datadog for Government site, add the fips-proxy sidecar container and open container ports to ensure proper communication for supported features.

Note: You must also ensure that the sidecar container is configured with applicable network settings and IAM permissions.

 {
   "containerDefinitions": [
     (...)
          {
            "name": "fips-proxy",
            "image": "datadog/fips-proxy:1.1.5",
            "portMappings": [
                {
                    "containerPort": 9803,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9804,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9805,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9806,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9807,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9808,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9809,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9810,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9811,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9812,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9813,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9814,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9815,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9816,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9817,
                    "protocol": "tcp"
                },
                {
                    "containerPort": 9818,
                    "protocol": "tcp"
                }
            ],
            "essential": true,
            "environment": [
                {
                    "name": "DD_FIPS_PORT_RANGE_START",
                    "value": "9803"
                },
                {
                    "name": "DD_FIPS_LOCAL_ADDRESS",
                    "value": "127.0.0.1"
                }
            ]
        }
   ],
   "family": "datadog-agent-task"
}

You also need to update the environment variables of the Datadog Agent’s container to enable sending traffic through the FIPS proxy:

{
    "containerDefinitions": [
        {
            "name": "datadog-agent",
            "image": "public.ecr.aws/datadog/agent:latest",
            (...)
            "environment": [
              (...)
                {
                    "name": "DD_FIPS_ENABLED",
                    "value": "true"
                },
                {
                    "name": "DD_FIPS_PORT_RANGE_START",
                    "value": "9803"
                },
                {
                    "name": "DD_FIPS_HTTPS",
                    "value": "false"
                },
             ],
        },
    ],
   "family": "datadog-agent-task"
}

Troubleshooting

Need help? Contact Datadog support.

Further reading

PREVIEWING: rtrieu/product-analytics-ui-changes