'How to query job count and calculate success rate of kube-jobs

I am using grafana to create dashboard calculate success rate for kube_jobs. kube_job_status_succeeded gives the list of jobs that are successful and kube_job_complete list of completed jobs but that does not guarantee that job is successful. I have created a rule for both metrics

rules:
    - record: sre_kube_build_test_job_complete_total
      expr:  kube_job_complete{namespace="openshift-build-test",condition="true"}

    - record: sre_kube_build_test_job_succes_total
      expr:  kube_job_status_succeeded{namespace="openshift-build-test"}==1

I am not able to get the number of jobs that are completed over time, using this query though

sum_over_time (sre_kube_build_test_job_complete_total [1h] )

which gives me the result, but this is not correct. The total number of jobs ran over 1 hr is 7 and out of 7, 3 were successful.

{condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", job_name="sre-build-test-27468371", namespace="openshift-build-test", service="kube-state-metrics"}
18
{condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", job_name="sre-build-test-27468431", namespace="openshift-build-test", service="kube-state-metrics"}
102

Need any suggestions here?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source