'flink cluster with zookeeper HA always shutdown: [RECEIVED SIGNAL 15: SIGTERM]

Environment:

flink1.14.4 standalone application mode in kubernetes

according to official steps:

flink cluster: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/standalone/kubernetes/#application-mode

zookeeper HA: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/ha/zookeeper_ha/

The problem:

the jobmanager always shutdown and restart every three minutes then quit

-- no timer task and the program logic just a easy wordcount

-- when the cluster running no any input or nothing to do also have this problem every three minutes

-- if jobmanager non zookeeper HA don't have this problem

The question:

why the jobmanager always shutdown with the zookeeper HA and how to solve it

used the same steps and yaml from official site, so no idea for this problem

The code:

just a wordcound and other program also the problem

public static void main(String[] args) throws Exception {

    StreamExecutionEnvironment executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();

    DataStreamSource<String> dataStreamSource = executionEnvironment.socketTextStream(HOST, PORT);

    DataStream<Tuple2<String, Integer>> sum = dataStreamSource.flatMap(new WordCount.MyFlatMapper()).keyBy(0).sum(1);

    sum.print();

    executionEnvironment.execute();
}

Jobmanager pod resatrt and quit:


NAMESPACE     NAME                                      READY   STATUS    RESTARTS       AGE
default        flink-jobmanager-8jn6x                    1/1     Running   1 (118s ago)   5m38s
default        flink-jobmanager-8jn6x                    1/1     Running   2 (106s ago)   8m26s
default        flink-jobmanager-8jn6x                    1/1     Running   3 (1s ago)     9m41s
default        flink-jobmanager-8jn6x                    1/1     Running   4 (1s ago)     12m
default        flink-jobmanager-8jn6x                    1/1     Running   5 (0s ago)     15m
default        flink-jobmanager-8jn6x                    1/1     Running   6 (1s ago)     18m
default        flink-jobmanager-8jn6x                    1/1     Terminating   6 (1s ago)     18m
default        flink-jobmanager-8jn6x                    1/1     Terminating   6 (1s ago)     18m

Jobmanager logs:


-1--
2022-04-23 09:48:21,970 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering checkpoint 33 (type=CHECKPOINT) @ 1650707301963 for job 00000000000000000000000000000000.
2022-04-23 09:48:22,010 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed checkpoint 33 for job 00000000000000000000000000000000 (4917 bytes, checkpointDuration=23 ms, finalizationTime=24 ms).
2022-04-23 09:48:26,627 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2022-04-23 09:48:26,795 WARN  akka.actor.CoordinatedShutdown                               [] - Could not addJvmShutdownHook, due to: Shutdown in progress
2022-04-23 09:48:26,822 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.
2022-04-23 09:48:26,824 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.
2022-04-23 09:48:26,824 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.
2022-04-23 09:48:26,838 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.
2022-04-23 09:48:26,887 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting shut down.
2022-04-23 09:48:26,894 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting shut down.
---
-2--
2022-04-23 09:51:24,903 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering checkpoint 67 (type=CHECKPOINT) @ 1650707484897 for job 00000000000000000000000000000000.
2022-04-23 09:51:24,943 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed checkpoint 67 for job 00000000000000000000000000000000 (4982 bytes, checkpointDuration=21 ms, finalizationTime=25 ms).
2022-04-23 09:51:26,626 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2022-04-23 09:51:26,840 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.
2022-04-23 09:51:26,845 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.
2022-04-23 09:51:26,847 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.
2022-04-23 09:51:26,848 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.
2022-04-23 09:51:26,871 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remoting shut down.
---
-3--
2022-04-23 09:54:26,625 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2022-04-23 09:54:26,838 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.
2022-04-23 09:54:26,840 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.
[root@master 02-logger--ckps-nfs-reactive-hpa-zk]#
---
-4--
2022-04-23 09:57:26,627 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2022-04-23 09:57:26,632 INFO  org.apache.flink.runtime.blob.BlobServer                     [] - Stopped BLOB server at 0.0.0.0:6124
2022-04-23 09:57:26,812 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.
2022-04-23 09:57:26,812 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.
---
-5--
2022-04-23 10:00:26,625 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
2022-04-23 10:00:26,859 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Shutting down remote daemon.
2022-04-23 10:00:26,859 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator        [] - Remote daemon shut down; proceeding with flushing remote transports.
2022-04-23 10:00:26,884 WARN  akka.actor.CoordinatedShutdown                               [] - Could not addJvmShutdownHook, due to: Shutdown in progress
---

-- updated 2022/04/30 --

Debug logs: https://www.mediafire.com/file/3q8vpzqfnmohgng/debug.log/file

thx all!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source