Search in docs

Results ..

No results for ''
Search by Algolia

PromQL

You can find documentation on Prometheus Query Language on the web:

PromQL Documentation

PromQL Cheat Sheet

PromQL Example

Alerting Rule Creation with PromQL

To create an Alerting Rule with PromQL, go to notification page and create a new notification rule.

Select a scope:

  • If you select Any server, your PromQL will run in Bleemeo Cloud infrastructure, and will be executed for all of your servers.
  • If you select A specific server, your PromQL will run on your Glouton (Agent Bleemeo), and will be executed only for this server.
  • If you select A group of servers, your PromQL will run on each Glouton of this group, and will be executed only for agents of this group.

and select Use PromQL to define conditions to trigger alarm to use PromQL.

first view with promql

You have to choose a name, this name will be used as alert name in status dashboard and notifications.

You can add warning and/or critical PromQL and configure the delay. The delay corresponds to the time during which the threshold of the PromQL must be exceeded to change its status.

second view promql

For a PromQL to trigger alerts, it must at least have the format: metric_name > thresholds

Example: Cassandra

We have several Cassandra server running on our Kubernetes, we want to trigger a warning alert if the sum of the cpu_used of the containers exceeds 50% during 300 seconds and trigger a critical alert if it exeeds during 600 seconds.

It's possible to put the same PromQL for warning and critical but just with different delays.

PromQL

sum(container_cpu_used{item=~"k8s_cassandra_cassandra.*"}) > 50

~"k8s_cassandra_cassandra.*" This allows you to find all the containers whose item name begins with k8s_cassandra_cassandra

promql cassandra

Example: Workers Load

We have a Kubernetes cluster with 3 workers, we want to trigger a warning alert if the sum of the system_load of 3 workers exceeds 15 during 300 seconds and trigger a critical alert if exceed 30 during 300 seconds.

PromQL Warning:

sum(system_load1{instance=~"host-fqdn-1|host-fqdn-2|host-fqdn-3"}) > 15

PromQL Critical:

sum(system_load1{instance=~"host-fqdn-1|host-fqdn-2|host-fqdn-3"}) > 30

promql load

Example: Multi Metrics

We want to create a notification rule on any server, we want to create an alerting rule which will trigger a warning alert if on a server we have the cpu_used > 50 and the mem_used_perc > 80 for 120 seconds, and which will trigger a critical alert if on a server we have the cpu_used > 80 and the mem_used_perc > 90 for 300 seconds.

If you want use multi metrics in your PromQL, all labels of each metrics must be the same. Else you can filter with `on (name_label)` like the example below.

PromQL Warning:

cpu_used > 50 and on (instance) mem_used_perc > 80

PromQL Critical:

cpu_used > 80 and on (instance) mem_used_perc > 90

promql multi metrics