- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
This page provides troubleshooting information for container monitoring.
There are three methods of deploying the Agent:
In a cloud environment, such as Amazon ECS, Fargate in an Amazon ECS environment, or Amazon EKS
These different methods present unique deployment challenges. Use this page as a starting point to resolve issues. If you continue to have trouble, reach out to Datadog support for further assistance.
For details on Agent release updates or changes, refer to Datadog’s release notes.
A useful way to inject environment variables or to configure a DogStatsD library is to implement the Admission Controller feature on the Cluster Agent. Note: The Cluster Agent must be deployed and running before the application is deployed.
Verify that the following are true:
The metrics endpoint is exposed and is open for the Agent to reach.
There are no proxies or firewalls that might impede the Agent from accessing the endpoint.
Agent has Autodiscovery enabled.
There are two environment variables that can effect whether logs are collected and from which containers:
DD_LOGS_ENABLED
to true
to collect logs.DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
to true
to collect all logs from all containers.To exclude logs (and other features) from collection, see the Container Discovery Management guide.
The most common error that prevents connection to the Kubelet API is the verification of the Kubelet TLS certificate.
TLS verification is enabled by default, and may prevent the Agent from connecting to the Kubelet API through HTTPS. You can disable TLS verification by using dedicated parameters or by setting the DD_KUBELET_TLS_VERIFY
variable for all containers in the Agent manifest:
TLS_VERIFY
to false
.First, ensure that the Cluster Agent is deployed and able to send data to the node Agent.
Then, review the query used to scale the external metrics in the Metrics Summary. Only valid queries autoscale. If there are multiple queries, all queries are ignored if any of the queries are invalid.
When reaching out for further assistance for HPA metrics, provide the following to Datadog support:
describe
output of the HPA manifest:$ kubectl describe hpa > hpa.log
describe
output of the DatadogMetric Custom Resource Definition:$ kubectl describe DatadogMetric > DatadogMetric.log
For logs, make sure that the Agent deployment command has DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
and DD_LOGS_ENABLED
enabled.
Ensure that your IAM policy is updated.
ECS: Ensure that the log router is attached to the container from which you would like to collect logs.
EKS: There are two common ways for the Agent to collect logs in an EKS Fargate environment: Log forwarding with CloudWatch logs, and log forwarding through Amazon Data Firehose. Using Amazon Data Firehose to collect logs requires the successful implementation of the Amazon Data Firehose delivery stream, as well as some command line tools.
First, ensure your API key is valid.
Then, in your node Agent Pod, run the agent status
command and review the results.
kubeapi_server
, kube_controller_manager
, or etcd
metricsOn managed services such as Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE), the user cannot access the control plane components. As a result, it is not possible to run the kube_apiserver
, kube_controller_manager
, kube_scheduler
, or etcd
checks in these environments.
[ENTRYPOINT][ERROR] Could not start the service: The service did not respond to the start or control request in a timely fashion.
. Error: [1053 (0x41d)]
To avoid this error, make sure you’ve set a CPU units reservation of at least 512
for the Datadog Agent.
After you open a support ticket, you may be asked for the following types of information:
You can use the flare
command to send troubleshooting information to Datadog support.
Node Agent flare
$ kubectl exec <AGENT_POD_NAME> -it agent flare <CASE_ID>
Cluster Agent flare
$ kubectl exec <CLUSTER_AGENT_POD_NAME> -it agent flare <CASE_ID>
This provides the team with insight on how the node or Cluster Agent was deployed, what the most recent events were for the pod, and if some qualities (such as custom tags) were injected and applied to host metrics. The > <FILENAME>.yaml
section of the command creates a file output that can be sent to Datadog support as an attachment:
$ kubectl describe pod <POD_NAME> > <FILENAME>.yaml
This is the file used to deploy the Agent in your environment. It informs Datadog of the tags configured, whether logs were enabled, and if certain containers are defined to be ignored.
In the case of deploying the Agent in a runtime environment, send Support the command line used to deploy the Agent.
The three most common deployment methods are: Helm chart, DaemonSet, and Operator.
If you are experiencing missing or inaccurate metrics, Datadog support may ask for the result of a cURL output of the node Agent trying to reach the metric endpoint. This is done by running the command from inside the Agent container, and can inform support if the Agent has access to the metrics. Note: This is not possible in a Fargate or managed services:
$ kubectl exec -it <AGENT_POD_NAME> curl -k -v ""<METRIC_ENDPOINT>""
$ docker exec -it <AGENT_CONTAINER_ID> curl -k -v "<METRIC_ENDPOINT>"
추가 유용한 문서, 링크 및 기사: