Maintaining and running your Datadog installation

In the Plan and Build sections, you gained insights into setting goals, strategizing integrations, and constructing and iterating on the Datadog environment for smooth production use. Next, you’ll learn about the run phase, where you’ll manage a series of internal and external tasks to keep the Datadog installation running efficiently.

Service tasks

Reduce risks and increase adoption by releasing new Datadog installations sequentially. This section lists a sequence of item releases to optimize the user experience with Datadog. Due to the diversity of IT architecture, this guide is high-level. Here are a few highlights:

Onboarding a new infrastructure instance

Infrastructure is the core element of IT and observability. It is the primary and most frequent task for a Datadog administrator team. The platform is adaptable, offering tools to streamline most tasks. Begin by tailoring it to your specific environment. Your IT architecture might include components such as HyperVisors, HyperScalers, and Serverless Infrastructure.

Recommendations:

Use Fleet Automation to remotely manage your Agents at scale. Continuously monitor your teams for new infrastructure requests, flagging them early, and applying engineering resources to focus on sensible expansions to your infrastructure offerings.

Onboarding a new application footprint

Adding an application to Datadog is a common task in the early days of Datadog administration. Develop an efficient mechanism that matches your local conditions to the requirements of Datadog. At a minimum, include the knowledge base items in the planning phase, along with additional considerations:

  • The Universal Service Tag version is important to many visualizations. Developing an automated, reliable, and compliant method to power these higher value visualizations.

  • Establishing a comprehensive service catalog provides numerous benefits in the future. Service Catalog is central to the Datadog design pattern, and hosts the objects of governance, dependency, and service definition.

Recommendations:
Develop automatic version tagging integrated into your application build process. Focus on service catalog, and track readiness with setup guidance.

Fielding technical issues

Due to its platform-as-a-service structure, Datadog demands little troubleshooting from you, the administrator. To help identify issues in the host Agent, use the datadog-agent status command. This command reports granular, specific, and actionable information that identifies areas to address. Additionally, use the datadog-agent flare command to quickly surface issues that need to be addressed by Datadog Support.

Recommendations: Use the status and flare commands from day one.

Administration Tasks

Like all other enterprise software, ongoing maintenance tasks must be well-organized and adhere to your local policies. Common ongoing tasks include:

Usage Monitoring

Monitoring consumption is essential, as is adopting the tools provided for this purpose. Datadog provides an estimated usage metrics dashboard that can serve as the foundation for this capability. There are also out-of-the-box dashboards for visualizing the estimated usage across all of your logs, metrics, and traces.

Deploy Dashboards and monitors

After your users become familiar with Datadog, they may request refinements and adjustments to frequently used items such as dashboards and monitors. The components, including SLOs and other content objects, are designed for iterative development and are written in JSON. They can be cloned, exported, modified, imported, and stored as flat files. Additionally, a terraform provider is available, along with a dashboards API for interacting with and creating dashboards.

When creating dashboards, prioritize the content you want to display over the construction process. This creative process is supported by dashboard creation tools and in the pre-built dashboards that come with the product. Each dashboard within the 800 integrations is a value-added template for monitoring its corresponding technology. Out-of-the-box dashboards offer the benefit of Datadog’s experience and prescriptive model for observability.

Recommendations:

A common OOTB dashboard is the AWS EC2 Overview Dashboard:

AWS EC2 Overview Dashboard

API key rotation

The Datadog platform uses standard Restful API Key Authentication and recommends following standard API Key Security, including key rotation. It is also beneficial to organize the assignment of these keys to logical working groups to optimize the security profile and rotation operation.

Recommendations:

Incorporate Datadog API and App Keys into your own systems for key management. Organize keys into groups that can be easily maintained.

RBAC objects roles, teams, and permission sets

Datadog RBAC relies on your SAML provider, and the AD/LDAP store upstream of that SAML provider. It can mirror the AD user groups and assign Datadog-specific permissions in a standard group-mapping. Collaboration between Datadog admins and SAML/AD/LDAP admins is necessary to exchange the specific group names and attributes for the key-value structure.

Datadog Agent updates

Agent components are regularly updated with security and feature enhancements, so it’s best to remain up-to-date. Follow your local procedures for testing and release of new software.

Recommendations:

Include Datadog upgrades within existing patch management standards and upgrade policies. Subscribe to Datadog’s release feed and closely monitor your Fleet Automation page for Agents that require upgrades.

Summary

Datadog administration has several activities that should fit well into your existing process standards. Incorporate Datadog into your standard systems for key rotation, patch updates, onboarding, and Infrastructure as Code (IaC). Publish these standards early to guide users in getting started with your new Datadog installation.

Next steps

After successfully planning, setting up, and maintaining your Datadog installation, use the following resources to support your ongoing Datadog journey:

Further Reading

Additional helpful documentation, links, and articles:

PREVIEWING: piotr_wolski/update-dsm-docs