'Openshift Prometheus - How do I alert only when there are multiple cronjob failures
I have a couple of cronjobs running in an Openshift cluster and want to monitor for failures. However I don't care about a single failure, I only want to alert when there are two or more consecutive failed jobs. As long as the job completes at least once every 8-12 hours (running in 4 hour steps) then no error should be fired.
I've tried using the guide written here, and also tried modifying it to no avail. Medium.com blog article
Thanks.
Solution 1:[1]
I would recommend to use the OpenShift feature "Userspace Monitoring". With this feature you can define AlertManager Rules based on Prometheus Metrics (in your case the count of container restarts or Job failures) and then send those alerts to your desired destination (e.g. a slack channel).
Further details about this feature can be found here: OpenShift docs
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Markus Kofler |
