'Quartz running on Kubernetes using Replicaset's

I have a quartz job that runs every 15 minutes in a cluster of 6 pods running on kubernetes. The app had an OOM mid quartz job and the pod was automatically killed/restarted when the liveness failed.

Due to the use of ReplicaSets in Kubernetes the new POD name was different and job was stuck in a "BLOCKING" state, essentially the job was tied to the instance name for the pod that was deleted.

Relatively easy to resolve at a DB level but how can we be more resilent and prevent the infinite blocking going forward. Does anyone have any suggestions? I would like to avoid StatefulSets in Kubernetes if possible.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source