'KubernetesClientException: too old resource version in long running spark job

We are using https://github.com/GoogleCloudPlatform/spark-on-k8s-operator to run spark jobs on Kubernetes.

  • Spark version : 3.1.1
  • Hadoop version : 3.2.0
  • Spark image : gcr.io/spark-operator/spark:v3.1.1
  • Kubernetes client jar: kubernetes-client-4.12.0.jar

We are getting this issue intermittently in the long running spark jobs. Relevant Logs:

 io.fabric8.kubernetes.client.KubernetesClientException: too old resource version
    at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:258)
    at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
    at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
    at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
    at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)

Any pointers on how to fix this?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source