Mechanisms to Ensure Logs are Not Lost
The Datadog Agent has several mechanisms to ensure that no logs are lost.
Log rotate
When a file is rotated, the Agent keeps tailing the old file while starting to tail the newly created file in parallel.
Although the Agent continues to tail the old file, a timeout is set after the log rotation. Any data that remains after the timeout is not read by the agent and is lost. If you find yourself hitting this case with any frequency, it is recommended to increase the timeout from its default of 60 seconds. The timeout interval can be defined with the logs_config.close_timeout
setting in the Agent’s main configuration file or the DD_LOGS_CONFIG_CLOSE_TIMEOUT
env variable.
Network issues
File tailing
The Agent stores a pointer for each tailed file. If there is a network connection issue, the Agent stops sending logs until the connection is restored and automatically picks up where it stopped to ensure no logs are lost.
Port listening
If the Agent is listening to a TCP or UDP port and faces a network issue, the logs are stored in a local buffer until the network is available again.
However, there are some limits for this buffer in order to avoid memory issues. New logs are dropped when the buffer is full.
Container logs
As for files, Datadog stores a pointer for each tailed container. Therefore, in the case of network issues, it is possible for the Agent to know which logs have not been sent yet.
However, if the tailed container is removed before the network is available again, the logs are not accessible anymore.
Additional helpful documentation, links, and articles: