'Hbase connections timing out after region server died

Whenever a region server dies/stops unexpectedly, our Hbase connections start timing out, and do not internally connect to the other alive region servers. I have to always restart our entire application to fix the issue. Attaching relevant error logs.

2022-02-28 23:34:20,055 ERROR AsyncProcess [hconnection-0x1bcf2c64-shared--pool1-t82] Internal AsyncProcess #1 error for <hbase table> processing for <region-server-1>,16020,1646044038281

java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ArrayStoreException: org.apache.hadoop.hbase.ipc.FailedServerException
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:759)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:1010)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:919)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1237)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveGlobalFailure(AsyncProcess.java:1198)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:600)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:743)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:1010)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:919)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.resubmit(AsyncProcess.java:1237)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveGlobalFailure(AsyncProcess.java:1198)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1200(AsyncProcess.java:600)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:743)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_202]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_202]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]

Some configs:

  • Hbase version: 2.1.7
  • Cluster replication factor: 3
  • RPC Timeout: 5000ms
  • ZookeeperSessionTimeout: 60000ms
  • ClientScannerTimeoutPeriod: 60000ms
  • ClientRetryCount: 3
  • ClientOperationTimeout: 1200000ms

Do let me know if I need to provide any other information. Looking for some help in identifying and fixing the issue. Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source