'ActiveMQ Artemis master slave error when backup becomes live

I have a master slave setup with 1 master and 2 slaves. When I kill the master, one of the slave tries to become master but fails with following exception:

2022/03/08 16:13:28.746 | mb | ERROR | 1-156 | o.a.a.a.c.server                         |                                      | AMQ224000: Failure in initialisation: java.lang.IndexOutOfBoundsException: length(32634) exceeds src.readableBytes(32500) where src is: UnpooledHeapByteBuf(ridx: 78, widx: 32578, cap: 32578/32578)
    at io.netty.buffer.AbstractByteBuf.checkReadableBounds(AbstractByteBuf.java:643)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1095)
    at org.apache.activemq.artemis.core.message.impl.CoreMessage.reloadPersistence(CoreMessage.java:1207)
    at org.apache.activemq.artemis.core.message.impl.CoreMessagePersister.decode(CoreMessagePersister.java:85)
    at org.apache.activemq.artemis.core.message.impl.CoreMessagePersister.decode(CoreMessagePersister.java:28)
    at org.apache.activemq.artemis.spi.core.protocol.MessagePersister.decode(MessagePersister.java:120)
    at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.decodeMessage(AbstractJournalStorageManager.java:1336)
    at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.lambda$loadMessageJournal$1(AbstractJournalStorageManager.java:1035)
    at org.apache.activemq.artemis.utils.collections.SparseArrayLinkedList$SparseArray.clear(SparseArrayLinkedList.java:114)
    at org.apache.activemq.artemis.utils.collections.SparseArrayLinkedList.clearSparseArrayList(SparseArrayLinkedList.java:173)
    at org.apache.activemq.artemis.utils.collections.SparseArrayLinkedList.clear(SparseArrayLinkedList.java:227)
    at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.loadMessageJournal(AbstractJournalStorageManager.java:990)
    at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java:3484)
    at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java:3149)
    at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:325)
    at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:4170)

I'm also observing a lot of messages like this one:

2022/03/08 16:13:28.745 | AMQ224009: Cannot find message 36,887,402,768
2022/03/08 16:13:28.745 | AMQ224009: Cannot find message 36,887,402,768

Master setup:

<ha-policy>
   <replication>
      <master>
         <check-for-live-server>true</check-for-live-server>
      </master>
   </replication>
</ha-policy>
<connectors>
   <connector name="connector-server-0">tcp://172.16.134.51:62616</connector>
   <connector name="connector-server-1">tcp://172.16.134.52:62616</connector>
   <connector name="connector-server-2">tcp://172.16.134.28:62616</connector>
</connectors>
<acceptors>
   <acceptor name="netty-acceptor">tcp://172.16.134.51:62616</acceptor>
   <acceptor name="invm">vm://0</acceptor>
</acceptors>
<cluster-connections>
   <cluster-connection name="my-cluster">
      <connector-ref>connector-server-0</connector-ref>
      <retry-interval>500</retry-interval>
      <use-duplicate-detection>true</use-duplicate-detection>
      <message-load-balancing>ON_DEMAND</message-load-balancing>
      <max-hops>1</max-hops>
      <static-connectors>
         <connector-ref>connector-server-1</connector-ref>
         <connector-ref>connector-server-2</connector-ref>
      </static-connectors>
   </cluster-connection>
</cluster-connections>

Slave 1 setup:

<ha-policy>
   <replication>
      <slave>
         <allow-failback>true</allow-failback>
      </slave>
   </replication>
</ha-policy>
<connectors>
   <connector name="connector-server-0">tcp://172.16.134.51:62616</connector>
   <connector name="connector-server-1">tcp://172.16.134.52:62616</connector>
   <connector name="connector-server-2">tcp://172.16.134.28:62616</connector>
</connectors>
<acceptors>
   <acceptor name="netty-acceptor">tcp://172.16.134.52:62616</acceptor>
   <acceptor name="invm">vm://0</acceptor>
</acceptors>
<cluster-connections>
   <cluster-connection name="cluster">
      <connector-ref>connector-server-1</connector-ref>
      <retry-interval>500</retry-interval>
      <use-duplicate-detection>true</use-duplicate-detection>
      <message-load-balancing>ON_DEMAND</message-load-balancing>
      <max-hops>1</max-hops>
      <static-connectors>
         <connector-ref>connector-server-0</connector-ref>
         <connector-ref>connector-server-2</connector-ref>
      </static-connectors>
   </cluster-connection>
</cluster-connections>

Slave 2

<ha-policy>
   <replication>
      <slave>
         <allow-failback>true</allow-failback>
      </slave>
   </replication>
</ha-policy>
<connectors>
   <connector name="connector-server-0">tcp://172.16.134.51:62616</connector>
   <connector name="connector-server-1">tcp://172.16.134.52:62616</connector>
   <connector name="connector-server-2">tcp://172.16.134.28:62616</connector>
</connectors>
<acceptors>
   <acceptor name="netty-acceptor">tcp://172.16.134.28:62616</acceptor>
   <acceptor name="invm">vm://0</acceptor>
</acceptors>
<cluster-connections>
  <cluster-connection name="cluster">
      <connector-ref>connector-server-2</connector-ref>
      <retry-interval>500</retry-interval>
      <use-duplicate-detection>true</use-duplicate-detection>
      <message-load-balancing>ON_DEMAND</message-load-balancing>
      <max-hops>1</max-hops>
      <static-connectors>
         <connector-ref>connector-server-0</connector-ref>
         <connector-ref>connector-server-1</connector-ref>
      </static-connectors>
   </cluster-connection>
</cluster-connections>

Could you please tell me what is not correct in my setup? I'm using activemq-artemis version 2.17.0



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source