'Prometheus alert with a comparison binary operator not firing?
I'm trying to write a simplified example of an alert which fires after the Kafka consumer group lag metric exposed by the Kafka Exporter exceeds a certain value. With the following directory structure,
.
├── README.md
├── docker-compose.yml
├── kafka-exporter
│ ├── Dockerfile
│ └── run.sh
└── prometheus
├── alerts.rules.yml
└── prometheus.yml
where the docker-compose.yml
reads
version: '2'
networks:
app-tier:
driver: bridge
services:
zookeeper:
image: 'bitnami/zookeeper:latest'
environment:
- 'ALLOW_ANONYMOUS_LOGIN=yes'
networks:
- app-tier
kafka:
image: 'bitnami/kafka:latest'
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
networks:
- app-tier
kafka-exporter:
build: kafka-exporter
ports:
- "9308:9308"
networks:
- app-tier
entrypoint: ["run.sh"]
prometheus:
image: bitnami/prometheus:latest
ports:
- "9090:9090"
volumes:
- "./prometheus/prometheus.yml:/opt/bitnami/prometheus/conf/prometheus.yml"
- "./prometheus/alerts.rules.yml:/alerts.rules.yml"
networks:
- app-tier
grafana:
image: grafana/grafana
ports:
- "3000:3000"
networks:
- app-tier
the run.sh
is a wrapper script to wait for Kafka to be ready,
#!/bin/sh
while ! bin/kafka_exporter --verbosity 2; do
echo "Waiting for the Kafka cluster to come up..."
sleep 1
done
and the Prometheus configuration files are prometheus.yml
,
global:
scrape_interval: 10s
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: kafka-exporter
metrics_path: /metrics
honor_labels: false
honor_timestamps: true
sample_limit: 0
static_configs:
- targets: ['kafka-exporter:9308']
rule_files:
- "/alerts.rules.yml"
and alerts.rules.yml
,
groups:
- name: alerts
rules:
- alert: excessive_consumer_group_lag
expr: kafka_consumergroup_lag{topic="example"} > 10
One thing I've omitted here is an example app which consumes from the example
topic using a consumer group named my-consumer-group
, which I then manually stop and then produce messages to the topic using the Kafka console producer:
> docker run -it --network kafka-exporter-example_app-tier bitnami/kafka:latest kafka-console-producer.sh --topic example --bootstrap-server kafka:9092
kafka 18:42:41.24
kafka 18:42:41.24 Welcome to the Bitnami kafka container
kafka 18:42:41.24 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-kafka
kafka 18:42:41.25 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-kafka/issues
kafka 18:42:41.25
>Message 1
>Message 2
...
After doing this for more than 10 times, I can see the corresponding metric increase in Grafana:
However, in the Prometheus UI, the corresponding alert is neither pending nor firing:
I'm struggling to see why the alert is not firing? The expression seems similar to the one given in the example in https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|