'Infinispan with distributed cache hangs after merged clusterview and Topology change
I am currently analyzing a cluster environment with a Distributed cache.
The dist caches on all nodes are not able to rebalance and commit any transaction after nodes leaving and join again the cluster. We managed 100K cache entries. The cluster has 12 nodes and the cache entries have 7 owners.
2022-03-17 12:29:52,144 INFO [org.infinispan.CLUSTER] ISPN000094: Received new cluster view for channel ee-cache: [web02|12] (6) [web02, import02, profile02, import04, web04, profile04]
2022-03-17 12:29:52,378 INFO [org.infinispan.CLUSTER] ISPN100001: Node web01 left the cluster
2022-03-17 12:29:52,410 INFO [org.infinispan.CLUSTER] ISPN100001: Node import01 left the cluster
2022-03-17 12:29:52,410 INFO [org.infinispan.CLUSTER] ISPN100001: Node web03 left the cluster
2022-03-17 12:29:52,410 INFO [org.infinispan.CLUSTER] ISPN100001: Node profile03 left the cluster
2022-03-17 12:29:52,411 INFO [org.infinispan.CLUSTER] ISPN100001: Node profile01 left the cluster
2022-03-17 12:29:52,411 INFO [org.infinispan.CLUSTER] ISPN100001: Node import03 left the cluster
...
2022-03-17 12:38:52,743 INFO [org.infinispan.CLUSTER] ISPN000093: Received new, MERGED cluster view for channel ee-cache: MergeView::[web01|14] (12) [web01, import01, web02, profile03, import02, profile01, profile02, import03, import04, profile04, web03, web04], 2 subgroups: [web01|11] (12) [web01, import01, web03, web02, profile03, import02, profile01, profile02, import03, import04, web04, profile04], [web03|13] (9) [web03, web02, import02, profile02, import04, web04, profile04, profile03, import03]
2022-03-17 12:38:52,745 INFO [org.infinispan.CLUSTER] ISPN100000: Node web01 joined the cluster
2022-03-17 12:38:52,745 INFO [org.infinispan.CLUSTER] ISPN100000: Node import01 joined the cluster
2022-03-17 12:38:52,745 INFO [org.infinispan.CLUSTER] ISPN100000: Node profile01 joined the cluster
...
2022-03-17 12:38:01,005 INFO [org.infinispan.CLUSTER] ISPN000093: Received new, MERGED cluster view for channel ee-cache: MergeView::[web03|13] (9) [web03, web02, import02, profile02, import04, web04, profile04, profile03, import03], 1 subgroups: [web02|12] (6) [web02, import02, profile02, import04, web04, profile04]
2022-03-17 12:38:01,006 INFO [org.infinispan.CLUSTER] ISPN100000: Node web03 joined the cluster
2022-03-17 12:38:01,006 INFO [org.infinispan.CLUSTER] ISPN100000: Node profile03 joined the cluster
2022-03-17 12:38:01,006 INFO [org.infinispan.CLUSTER] ISPN100000: Node import03 joined the cluster
...
2022-03-17 12:42:36,746 WARN [org.infinispan.transaction.impl.TransactionTable] ISPN000326: Remote transaction RecoveryAwareGlobalTransaction{xid=Xid{formatId=131077, globalTransactionId=00000000000000000000FFFF0A0EC20BF0C59BB56231C906004A96B8696D706F72743031,branchQualifier=00000000000000000000FFFF0A0EC20BF0C59BB56231C906004A96C50000000000000000}, internalId=3096233333820528} GlobalTx:import01:211500 timed out. Rolling back after 206482 ms
...
2022-03-17 14:18:06,776 WARN [org.infinispan.transaction.impl.TransactionTable] ISPN000326: Remote transaction RecoveryAwareGlobalTransaction{xid=Xid{formatId=131077, globalTransactionId=00000000000000000000FFFF0A0EC229426AB2D86231C917005EBD6C696D706F72743034,branchQualifier=00000000000000000000FFFF0A0EC229426AB2D86231C917005EBD760000000000000000}, internalId=3377716900554309} GlobalTx:import04:231182 timed out. Rolling back after 199021 ms
The transaction no longer exists on the respective other nodes.
2022-03-17 12:43:10,269 ERROR [org.infinispan.transaction.impl.TransactionCoordinator] ISPN000255: Error while processing prepare: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from profile02, see cause for remote stack trace
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:28)
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.impl.MapResponseCollector.addException(MapResponseCollector.java:65)
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.impl.MapResponseCollector$IgnoreLeavers.addException(MapResponseCollector.java:103)
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.ValidResponseCollector.addResponse(ValidResponseCollector.java:29)
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.impl.MultiTargetRequest.onResponse(MultiTargetRequest.java:91)
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1369)
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1272)
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$300(JGroupsTransport.java:126)
at org.infinispan.core:[email protected]//org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1417)
at org.jgroups:[email protected]//org.jgroups.JChannel.up(JChannel.java:816)
at org.jgroups:[email protected]//org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:134)
at org.jgroups:[email protected]//org.jgroups.stack.Protocol.up(Protocol.java:339)
at org.jgroups:[email protected]//org.jgroups.protocols.FORK.up(FORK.java:142)
at org.jgroups:[email protected]//org.jgroups.protocols.FRAG3.up(FRAG3.java:171)
at org.jgroups:[email protected]//org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
at org.jgroups:[email protected]//org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
at org.jgroups:[email protected]//org.jgroups.protocols.pbcast.GMS.up(GMS.java:872)
at org.jgroups:[email protected]//org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240)
at org.jgroups:[email protected]//org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1008)
at org.jgroups:[email protected]//org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:734)
at org.jgroups:[email protected]//org.jgroups.protocols.UNICAST3.up(UNICAST3.java:389)
at org.jgroups:[email protected]//org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:590)
at org.jgroups:[email protected]//org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:131)
at org.jgroups:[email protected]//org.jgroups.stack.Protocol.up(Protocol.java:339)
at org.jgroups:[email protected]//org.jgroups.protocols.FD_ALL.up(FD_ALL.java:203)
at org.jgroups:[email protected]//org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:253)
at org.jgroups:[email protected]//org.jgroups.protocols.MERGE3.up(MERGE3.java:280)
at org.jgroups:[email protected]//org.jgroups.protocols.Discovery.up(Discovery.java:295)
at org.jgroups:[email protected]//org.jgroups.protocols.TP.passMessageUp(TP.java:1250)
at org.jgroups:[email protected]//org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Suppressed: org.infinispan.util.logging.TraceException
at org.infinispan.core:[email protected]//org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.get(SimpleAsyncInvocationStage.java:41)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:250)
at org.infinispan.core:[email protected]//org.infinispan.transaction.impl.TransactionCoordinator.prepare(TransactionCoordinator.java:120)
at org.infinispan.core:[email protected]//org.infinispan.transaction.impl.TransactionCoordinator.prepare(TransactionCoordinator.java:103)
at org.infinispan.core:[email protected]//org.infinispan.transaction.xa.XaTransactionTable.prepare(XaTransactionTable.java:110)
at org.infinispan.core:[email protected]//org.infinispan.transaction.xa.TransactionXaAdapter.prepare(TransactionXaAdapter.java:60)
at org.jboss.jts//com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelPrepare(XAResourceRecord.java:214)
at org.jboss.jts//com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2673)
at org.jboss.jts//com.arjuna.ats.arjuna.coordinator.BasicAction.doPrepare(BasicAction.java:2623)
at org.jboss.jts//com.arjuna.ats.arjuna.coordinator.BasicAction.prepare(BasicAction.java:2157)
at org.jboss.jts//com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1503)
at org.jboss.jts//com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:96)
at org.jboss.jts//com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
at org.jboss.jts//com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1287)
at org.jboss.jts//com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:126)
at org.jboss.jts.integration//com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.commit(BaseTransactionManagerDelegate.java:94)
at [email protected]//org.wildfly.transaction.client.LocalTransaction.commitAndDissociate(LocalTransaction.java:75)
at [email protected]//org.wildfly.transaction.client.ContextTransactionManager.commit(ContextTransactionManager.java:71)
at [email protected]//org.wildfly.transaction.client.LocalUserTransaction.commit(LocalUserTransaction.java:53)
Caused by: org.infinispan.commons.CacheException: Remote transaction for global transaction (RecoveryAwareGlobalTransaction{xid=Xid{formatId=131077, globalTransactionId=00000000000000000000FFFF0A0EC20BF0C59BB56231C906004A96B8696D706F72743031,branchQualifier=00000000000000000000FFFF0A0EC20BF0C59BB56231C906004A96C50000000000000000}, internalId=3096233333820528} GlobalTx:import01:211500) not found
at org.infinispan.core:[email protected]//org.infinispan.transaction.xa.recovery.RecoveryAwareTransactionTable.remoteTransactionPrepared(RecoveryAwareTransactionTable.java:44)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.impl.TxInterceptor.lambda$handlePrepareCommand$1(TxInterceptor.java:151)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.InvocationSuccessAction.apply(InvocationSuccessAction.java:22)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.SyncInvocationStage.addCallback(SyncInvocationStage.java:42)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.InvocationStage.thenAccept(InvocationStage.java:50)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.impl.TxInterceptor.handlePrepareCommand(TxInterceptor.java:146)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.impl.TxInterceptor.visitPrepareCommand(TxInterceptor.java:127)
at org.infinispan.core:[email protected]//org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:187)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.BaseAsyncInterceptor.lambda$new$0(BaseAsyncInterceptor.java:22)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.InvocationSuccessFunction.apply(InvocationSuccessFunction.java:25)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.addCallback(SimpleAsyncInvocationStage.java:70)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.InvocationStage.thenApply(InvocationStage.java:45)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.BaseAsyncInterceptor.asyncInvokeNext(BaseAsyncInterceptor.java:224)
at org.infinispan.core:[email protected]//org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitCommand(TransactionSynchronizerInterceptor.java:46)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:185)
at org.infinispan.core:[email protected]//org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:203)
at org.infinispan.core:[email protected]//org.infinispan.statetransfer.StateTransferInterceptor.visitPrepareCommand(StateTransferInterceptor.java:69)
at org.infinispan.core:[email protected]//org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:187)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:54)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.DDAsyncInterceptor.visitPrepareCommand(DDAsyncInterceptor.java:132)
at org.infinispan.core:[email protected]//org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:187)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:123)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:56)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.DDAsyncInterceptor.visitPrepareCommand(DDAsyncInterceptor.java:132)
at org.infinispan.core:[email protected]//org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:187)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50)
at org.infinispan.core:[email protected]//org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:234)
at org.infinispan.core:[email protected]//org.infinispan.commands.tx.PrepareCommand.invokeAsync(PrepareCommand.java:110)
at org.infinispan.core:[email protected]//org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:117)
at org.infinispan.core:[email protected]//org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:99)
at org.infinispan.core:[email protected]//org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:71)
at org.infinispan.core:[email protected]//org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:40)
... 3 more
We use Infinispan 9.4 within Wildfly 19.10.1. Below you find the configuration of the infinispan and jgroups subsystem
<subsystem xmlns="urn:infinispan:server:core:9.4">
<cache-container name="cache" default-cache="default" module="org.infinispan.extension:ispn-9.4">
<transport channel="ee-cache" lock-timeout="60000" initial-cluster-size="${app.infinispan.transport.initial_cluster_size:2}" initial-cluster-timeout="${app.infinispan.transport.initial_cluster_timeout:60000}"/>
...
<distributed-cache name="bobject" owners="${app.infinispan.num_owners:2}" remote-timeout="240000">
<state-transfer enabled="false"/>
<locking isolation="READ_COMMITTED" acquire-timeout="240000" concurrency-level="1000"/>
<transaction mode="FULL_XA" locking="OPTIMISTIC" stop-timeout="10000"/>
<memory>
<object size="100000"/>
</memory>
<expiration max-idle="10800000" lifespan="-1"/>
<partition-handling when-split="ALLOW_READ_WRITES" merge-policy="REMOVE_ALL"/>
</distributed-cache>
...
</cache-container>
</subsystem>
JGroups Subsystem (Stack is tcpping)
<subsystem xmlns="urn:infinispan:server:jgroups:9.4">
<channels default="ee-cache">
<channel name="ee-cache"/>
</channels>
<stacks default="${jboss.default.jgroups.stack:udp}">
<stack name="tcpping">
<transport type="TCP" socket-binding="jgroups-tcp" module="org.jgroups:ispn-9.4"/>
<protocol type="org.jgroups.protocols.TCPPING" module="org.jgroups:ispn-9.4">
<property name="initial_hosts">
${app.jgroups.tcpping.initial_hosts}
</property>
<property name="port_range">
${app.jgroups.tcpping.port_range:10}
</property>
</protocol>
<protocol type="MERGE3" module="org.jgroups:ispn-9.4"/>
<protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd" module="org.jgroups:ispn-9.4"/>
<protocol type="FD_ALL" module="org.jgroups:ispn-9.4"/>
<protocol type="FD_HOST" module="org.jgroups:ispn-9.4"/>
<protocol type="VERIFY_SUSPECT" module="org.jgroups:ispn-9.4"/>
<protocol type="pbcast.NAKACK2" module="org.jgroups:ispn-9.4">
<property name="use_mcast_xmit">
false
</property>
</protocol>
<protocol type="UNICAST3" module="org.jgroups:ispn-9.4"/>
<protocol type="pbcast.STABLE" module="org.jgroups:ispn-9.4"/>
<protocol type="pbcast.GMS" module="org.jgroups:ispn-9.4">
<property name="view_ack_collection_timeout">
${app.jgroups.tcpping.view_ack_collection_timeout:30000}
</property>
<property name="join_timeout">
${app.jgroups.tcpping.join_timeout:3000}
</property>
</protocol>
<protocol type="MFC" module="org.jgroups:ispn-9.4"/>
<protocol type="FRAG3" module="org.jgroups:ispn-9.4"/>
</stack>
...
</stacks>
</subsystem>
Do you have any suggestions for me for further analysis?
Many thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
