'Calculating live elapsed time using prometheus
We have around 16k batch jobs that run on a regular basis. Jobs have a name and each daily run of these 16k jobs have a run-id
Since these jobs take a good amount of time to finish, I want a live timer in grafana that tells me for how long a job has been running. e.g. now() - 'start-time of job' or if a job is completed then end-time - start-time of job
Our infrastructure is mainly prometheus & grafana. At first, I had the following idea of heartbeats (all abstract, finding it hard to map it in terms of prometheus & grafana)
On job start, emit status=1 (guage) (counter will increment) On job end, emit status=2 (guage)
Now the elapsed time in psuedocode would be
(get(status=2).map(timestamps).min or now()) - get(status=1).map(timestamps).min
Assuming get returns a vector of events where [status=<x>,timestamps]
Is prometheus even the right tool for this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
