Create cluster alerts to notify when a percentage of groups are in critical state
Overview
This guide shows how to create alerts that would not notify for each single group meeting the condition, but only when a given percent of them do.
This is helpful, for example, if you want a monitor that alerts only when a given percentage of hosts or containers reach a critical state.
Example: Alert for a percentage of hosts with high CPU usage
In this example, you want to receive a notification when 40 percent of hosts have a CPU usage above 50 percent. Leverage the min_cutoff
and count_nonzero
functions:
- Use the
min_cutoff
function to count the number of hosts that have CPU usage above 50 percent. - Use the
count_nonzero
function to count the total number of hosts. - Divide one by the other for the resulting percentage of hosts with CPU usage above 50 percent.
- Then, set the condition to alert if the percentage of hosts in that condition reaches 40 percent.
This monitor tracks the percentage of host that have a CPU usage above 50 percent within the last ten minutes and generates a notification if more than 40 percent of those hosts meet the specified condition.
Further Reading
Additional helpful documentation, links, and articles: