'Quartz running on Kubernetes using Replicaset's
I have a quartz job that runs every 15 minutes in a cluster of 6 pods running on kubernetes. The app had an OOM mid quartz job and the pod was automatically killed/restarted when the liveness failed.
Due to the use of ReplicaSets in Kubernetes the new POD name was different and job was stuck in a "BLOCKING" state, essentially the job was tied to the instance name for the pod that was deleted.
Relatively easy to resolve at a DB level but how can we be more resilent and prevent the infinite blocking going forward. Does anyone have any suggestions? I would like to avoid StatefulSets in Kubernetes if possible.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
