'ActiveMQ Artemis master slave error when backup becomes live
I have a master slave setup with 1 master and 2 slaves. When I kill the master, one of the slave tries to become master but fails with following exception:
2022/03/08 16:13:28.746 | mb | ERROR | 1-156 | o.a.a.a.c.server | | AMQ224000: Failure in initialisation: java.lang.IndexOutOfBoundsException: length(32634) exceeds src.readableBytes(32500) where src is: UnpooledHeapByteBuf(ridx: 78, widx: 32578, cap: 32578/32578)
at io.netty.buffer.AbstractByteBuf.checkReadableBounds(AbstractByteBuf.java:643)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1095)
at org.apache.activemq.artemis.core.message.impl.CoreMessage.reloadPersistence(CoreMessage.java:1207)
at org.apache.activemq.artemis.core.message.impl.CoreMessagePersister.decode(CoreMessagePersister.java:85)
at org.apache.activemq.artemis.core.message.impl.CoreMessagePersister.decode(CoreMessagePersister.java:28)
at org.apache.activemq.artemis.spi.core.protocol.MessagePersister.decode(MessagePersister.java:120)
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.decodeMessage(AbstractJournalStorageManager.java:1336)
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.lambda$loadMessageJournal$1(AbstractJournalStorageManager.java:1035)
at org.apache.activemq.artemis.utils.collections.SparseArrayLinkedList$SparseArray.clear(SparseArrayLinkedList.java:114)
at org.apache.activemq.artemis.utils.collections.SparseArrayLinkedList.clearSparseArrayList(SparseArrayLinkedList.java:173)
at org.apache.activemq.artemis.utils.collections.SparseArrayLinkedList.clear(SparseArrayLinkedList.java:227)
at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.loadMessageJournal(AbstractJournalStorageManager.java:990)
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java:3484)
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java:3149)
at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:325)
at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:4170)
I'm also observing a lot of messages like this one:
2022/03/08 16:13:28.745 | AMQ224009: Cannot find message 36,887,402,768
2022/03/08 16:13:28.745 | AMQ224009: Cannot find message 36,887,402,768
Master setup:
<ha-policy>
<replication>
<master>
<check-for-live-server>true</check-for-live-server>
</master>
</replication>
</ha-policy>
<connectors>
<connector name="connector-server-0">tcp://172.16.134.51:62616</connector>
<connector name="connector-server-1">tcp://172.16.134.52:62616</connector>
<connector name="connector-server-2">tcp://172.16.134.28:62616</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://172.16.134.51:62616</acceptor>
<acceptor name="invm">vm://0</acceptor>
</acceptors>
<cluster-connections>
<cluster-connection name="my-cluster">
<connector-ref>connector-server-0</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>connector-server-1</connector-ref>
<connector-ref>connector-server-2</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
Slave 1 setup:
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
</slave>
</replication>
</ha-policy>
<connectors>
<connector name="connector-server-0">tcp://172.16.134.51:62616</connector>
<connector name="connector-server-1">tcp://172.16.134.52:62616</connector>
<connector name="connector-server-2">tcp://172.16.134.28:62616</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://172.16.134.52:62616</acceptor>
<acceptor name="invm">vm://0</acceptor>
</acceptors>
<cluster-connections>
<cluster-connection name="cluster">
<connector-ref>connector-server-1</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>connector-server-0</connector-ref>
<connector-ref>connector-server-2</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
Slave 2
<ha-policy>
<replication>
<slave>
<allow-failback>true</allow-failback>
</slave>
</replication>
</ha-policy>
<connectors>
<connector name="connector-server-0">tcp://172.16.134.51:62616</connector>
<connector name="connector-server-1">tcp://172.16.134.52:62616</connector>
<connector name="connector-server-2">tcp://172.16.134.28:62616</connector>
</connectors>
<acceptors>
<acceptor name="netty-acceptor">tcp://172.16.134.28:62616</acceptor>
<acceptor name="invm">vm://0</acceptor>
</acceptors>
<cluster-connections>
<cluster-connection name="cluster">
<connector-ref>connector-server-2</connector-ref>
<retry-interval>500</retry-interval>
<use-duplicate-detection>true</use-duplicate-detection>
<message-load-balancing>ON_DEMAND</message-load-balancing>
<max-hops>1</max-hops>
<static-connectors>
<connector-ref>connector-server-0</connector-ref>
<connector-ref>connector-server-1</connector-ref>
</static-connectors>
</cluster-connection>
</cluster-connections>
Could you please tell me what is not correct in my setup? I'm using activemq-artemis version 2.17.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
