'How to query job count and calculate success rate of kube-jobs
I am using grafana to create dashboard calculate success rate for kube_jobs. kube_job_status_succeeded gives the list of jobs that are successful and kube_job_complete list of completed jobs but that does not guarantee that job is successful. I have created a rule for both metrics
rules:
- record: sre_kube_build_test_job_complete_total
expr: kube_job_complete{namespace="openshift-build-test",condition="true"}
- record: sre_kube_build_test_job_succes_total
expr: kube_job_status_succeeded{namespace="openshift-build-test"}==1
I am not able to get the number of jobs that are completed over time, using this query though
sum_over_time (sre_kube_build_test_job_complete_total [1h] )
which gives me the result, but this is not correct. The total number of jobs ran over 1 hr is 7 and out of 7, 3 were successful.
{condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", job_name="sre-build-test-27468371", namespace="openshift-build-test", service="kube-state-metrics"}
18
{condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", job_name="sre-build-test-27468431", namespace="openshift-build-test", service="kube-state-metrics"}
102
Need any suggestions here?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
