Custom OpenMetrics Check

Overview

This page dives into the OpenMetricsBaseCheckV2 interface for more advanced usage, including an example of a simple check that collects timing metrics and status events from Kong. For details on configuring a basic OpenMetrics check, see Kubernetes Prometheus and OpenMetrics metrics collection.

Note: OpenMetricsBaseCheckV2 is available in Agent v7.26.x+ and requires Python 3.

If you are looking for the legacy implementation or OpenMetricsBaseCheck interface custom check guide, please see Custom Legacy OpenMetrics Check.

Advanced usage: OpenMetrics check interface

If you have more advanced needs than the generic check, such as metrics preprocessing, you can write a custom OpenMetricsBaseCheckV2. It’s the base class of the generic check, and it provides a structure and some helpers to collect metrics, events, and service checks exposed with Prometheus. The minimal configuration for checks based on this class include:

  • Creating a default instance with namespace and metrics mapping.
  • Implementing the check() method AND/OR:
  • Creating a method named after the OpenMetric metric handled (see self.prometheus_metric_name).

See this example in the Kong integration where the Prometheus metric kong_upstream_target_health value is used as service check.

Writing a custom OpenMetrics check

This is a simple example of writing a Kong check to illustrate usage of the OpenMetricsBaseCheckV2 class. The example below replicates the functionality of the following generic Openmetrics check:

instances:
  - openmetrics_endpoint: http://localhost:8001/status/
    namespace: "kong"
    metrics:
      - kong_bandwidth: bandwidth
      - kong_http_consumer_status: http.consumer.status
      - kong_http_status: http.status
      - kong_latency:
          name: latency
          type: counter
      - kong_memory_lua_shared_dict_bytes: memory.lua.shared_dict.bytes
      - kong_memory_lua_shared_dict_total_bytes: memory.lua.shared_dict.total_bytes
      - kong_nginx_http_current_connections: nginx.http.current_connections
      - kong_nginx_stream_current_connections: nginx.stream.current_connections
      - kong_stream_status: stream.status

Configuration

The names of the configuration and check files must match. If your check is called mycheck.py your configuration file must be named mycheck.yaml.

Configuration for an Openmetrics check is almost the same as a regular Agent check. The main difference is to include the variable openmetrics_endpoint in your check.yaml file. This goes into conf.d/kong.yaml:

init_config:

instances:
    # URL of the Prometheus metrics endpoint
  - openmetrics_endpoint: http://localhost:8001/status/

Writing the check

All OpenMetrics checks inherit from the OpenMetricsBaseCheckV2 class:

from datadog_checks.base import OpenMetricsBaseCheckV2

class KongCheck(OpenMetricsBaseCheckV2):

Define the integration namespace

The value of __NAMESPACE__ will prefix all metrics and service checks collected by this integration.

from datadog_checks.base import OpenMetricsBaseCheckV2

class KongCheck(OpenMetricsBaseCheckV2):
    __NAMESPACE__ = "kong"

Define a metrics mapping

The metrics mapping allows you to rename the metric name and override the native metric type.

from datadog_checks.base import OpenMetricsBaseCheckV2

class KongCheck(OpenMetricsBaseCheckV2):
    __NAMESPACE__ = "kong"

    def __init__(self, name, init_config, instances):
        super(KongCheck, self).__init__(name, init_config, instances)

        self.metrics_map =  {
            'kong_bandwidth': 'bandwidth',
            'kong_http_consumer_status': 'http.consumer.status',
            'kong_http_status': 'http.status',
            'kong_latency': {
                'name': 'latency',
                'type': 'counter',
            },
            'kong_memory_lua_shared_dict_bytes': 'memory.lua.shared_dict.bytes',
            'kong_memory_lua_shared_dict_total_bytes': 'memory.lua.shared_dict.total_bytes',
            'kong_nginx_http_current_connections': 'nginx.http.current_connections',
            'kong_nginx_stream_current_connections': 'nginx.stream.current_connections',
            'kong_stream_status': 'stream.status',
        }

Define a default instance

A default instance is the basic configuration used for the check. The default instance should override metrics, and openmetrics_endpoint. Override the get_default_config in OpenMetricsBaseCheckV2 with your default instance.

from datadog_checks.base import OpenMetricsBaseCheckV2

class KongCheck(OpenMetricsBaseCheckV2):
    __NAMESPACE__ = "kong"

    def __init__(self, name, init_config, instances):
        super(KongCheck, self).__init__(name, init_config, instances)

        self.metrics_map = {
            'kong_bandwidth': 'bandwidth',
            'kong_http_consumer_status': 'http.consumer.status',
            'kong_http_status': 'http.status',
            'kong_latency': {
                'name': 'latency',
                'type': 'counter',
            },
            'kong_memory_lua_shared_dict_bytes': 'memory.lua.shared_dict.bytes',
            'kong_memory_lua_shared_dict_total_bytes': 'memory.lua.shared_dict.total_bytes',
            'kong_nginx_http_current_connections': 'nginx.http.current_connections',
            'kong_nginx_stream_current_connections': 'nginx.stream.current_connections',
            'kong_stream_status': 'stream.status',
        }

      def get_default_config(self):
            return {'metrics': self.metrics_map}

Implementing the check method

If you want to implement additional features, override the check() function.

From instance, use endpoint, which is the Prometheus or OpenMetrics metrics endpoint to poll metrics from:

def check(self, instance):
    endpoint = instance.get('openmetrics_endpoint')
Exceptions

If a check cannot run because of improper configuration, a programming error, or because it could not collect any metrics, it should raise a meaningful exception. This exception is logged and is shown in the Agent status command for debugging. For example:

$ sudo /etc/init.d/datadog-agent info

  Checks
  ======

    my_custom_check
    ---------------
      - instance #0 [ERROR]: Unable to find openmetrics_endpoint in config file.
      - Collected 0 metrics & 0 events

Improve your check() method with ConfigurationError:

from datadog_checks.base import ConfigurationError

def check(self, instance):
    endpoint = instance.get('openmetrics_endpoint')
    if endpoint is None:
        raise ConfigurationError("Unable to find openmetrics_endpoint in config file.")

Then as soon as you have data available, flush:


def check(self, instance):
    endpoint = instance.get('openmetrics_endpoint')
    if endpoint is None:
        raise ConfigurationError("Unable to find openmetrics_endpoint in config file.")

    super().check(instance)

Putting it all together

from datadog_checks.base import OpenMetricsBaseCheckV2
from datadog_checks.base import ConfigurationError

class KongCheck(OpenMetricsBaseCheckV2):
    __NAMESPACE__ = "kong"

    def __init__(self, name, init_config, instances):
        super(KongCheck, self).__init__(name, init_config, instances)

        self.metrics_map = {
            'kong_bandwidth': 'bandwidth',
            'kong_http_consumer_status': 'http.consumer.status',
            'kong_http_status': 'http.status',
            'kong_latency': {
                'name': 'latency',
                'type': 'counter',
            },
            'kong_memory_lua_shared_dict_bytes': 'memory.lua.shared_dict.bytes',
            'kong_memory_lua_shared_dict_total_bytes': 'memory.lua.shared_dict.total_bytes',
            'kong_nginx_http_current_connections': 'nginx.http.current_connections',
            'kong_nginx_stream_current_connections': 'nginx.stream.current_connections',
            'kong_stream_status': 'stream.status',
        }

      def get_default_config(self):
            return {'metrics': self.metrics_map}

      def check(self, instance):
          endpoint = instance.get('openmetrics_endpoint')
          if endpoint is None:
              raise ConfigurationError("Unable to find openmetrics_endpoint in config file.")

          super().check(instance)

Going further

To read more about Prometheus and OpenMetrics base integrations, see the integrations developer docs.

To see all configuration options available in Openmetrics, see the conf.yaml.example. You can improve your OpenMetrics check by including default values for additional configuration options:

exclude_metrics
Some metrics are ignored because they are duplicates or introduce a high cardinality. Metrics included in this list are silently skipped without an Unable to handle metric debug line in the logs. In order to exclude all metrics but the ones matching a specific filter, you can use a negative lookahead regex like: - ^(?!foo).*$
share_labels
If the share_labels mapping is provided, the mapping allows for the sharing of labels across multiple metrics. The keys represent the exposed metrics from which to share labels, and the values are mappings that configure the sharing behavior. Each mapping must have at least one of the following keys: labels, match, or values.
exclude_labels
exclude_labels is an array of labels to exclude. Those labels are not added as tags when submitting the metric.

Further Reading

PREVIEWING: mcretzman/DOCS-9337-add-cloud-info-byoti