'JDBC_PING can not discover instances (keycloak HA mode on aws)
I am trying to set up Keycloak on 2 AWS EC2 instances in HA mode with use of JDBC_PING connected to RDS DB (MySQL 5.7). At startup I get an error when instances try to discover each other with "org.jgroups.protocols.pbcast.GMS", but I get a timeout error, after which it turns to standalone mode. It seems like if instances could not reach each other, but tcpdump doesn't show any in/out activity regarding direct connection between the instances, but they actively talk to the DB, and the JGROUPS table is periodically updated with ping data. Can not understand what is the problem then. Below you can see logs.
INSTANCE 1
2018-01-12 17:53:01,264 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) address=ip-192-168-33-243, cluster=ee, physical address=0.0.0.0:7600
2018-01-12 17:53:01,264 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) address=ip-192-168-33-243, cluster=ee, physical address=0.0.0.0:7600
2018-01-12 17:53:31,305 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-33-243: no members discovered after 30033 ms: creating cluster as first memb
er
2018-01-12 17:53:31,305 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-33-243: no members discovered after 30033 ms: creating cluster as first member
2018-01-12 17:53:31,307 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-33-243: installing view [ip-192-168-33-243|0] (1) [ip-192-168-33-243]
2018-01-12 17:53:31,307 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-33-243: installing view [ip-192-168-33-243|0] (1) [ip-192-168-33-243]
2018-01-12 17:53:31,370 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-33-243: created cluster (first member). My view is [ip-192-168-33-243|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl
2018-01-12 17:53:01,264 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) address=ip-192-168-33-243, cluster=ee, physical address=0.0.0.0:7600
2018-01-12 17:53:01,264 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) address=ip-192-168-33-243, cluster=ee, physical address=0.0.0.0:7600
2018-01-12 17:53:31,305 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-33-243: no members discovered after 30033 ms: creating cluster as first memb
er
2018-01-12 17:53:31,305 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-33-243: no members discovered after 30033 ms: creating cluster as first member
2018-01-12 17:53:31,307 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-33-243: installing view [ip-192-168-33-243|0] (1) [ip-192-168-33-243]
2018-01-12 17:53:31,307 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-33-243: installing view [ip-192-168-33-243|0] (1) [ip-192-168-33-243]
2018-01-12 17:53:31,370 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-33-243: created cluster (first member). My view is [ip-192-168-33-243|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl
2018-01-12 17:53:31,370 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-33-243: created cluster (first member). My view is [ip-192-168-33-243|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl
INSTANCE 2
2018-01-14 17:27:55,458 WARN [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-27-128: JOIN(ip-192-168-27-128) sent to ip-192-168-33-243 timed out (after 3
0000 ms), on try 2
2018-01-14 17:27:55,489 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-27-128: discovery took 31 ms, members: 411 rsps (2 coords) [done]
2018-01-14 17:27:55,489 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-27-128: discovery took 31 ms, members: 411 rsps (2 coords) [done]
2018-01-14 17:27:55,489 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-27-128: found multiple coords: [ip-192-168-33-243, ip-192-168-27-128]
2018-01-14 17:27:55,489 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-27-128: found multiple coords: [ip-192-168-33-243, ip-192-168-27-128]
2018-01-14 17:27:55,489 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-27-128: sending JOIN(ip-192-168-27-128) to ip-192-168-33-243
2018-01-14 17:27:55,489 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-27-128: sending JOIN(ip-192-168-27-128) to ip-192-168-33-243
2018-01-14 17:28:25,490 WARN [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-27-128: JOIN(ip-192-168-27-128) sent to ip-192-168-33-243 timed out (after 3
0000 ms), on try 3
2018-01-14 17:28:25,490 WARN [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -
- 51) ip-192-168-27-128: JOIN(ip-192-168-27-128) sent to ip-192-168-33-243 timed out (after 3
0000 ms), on try 3
2018-01-14 17:28:25,490 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: sending JOIN(ip-192-168-27-128) to ip-192-168-27-128
2018-01-14 17:28:25,490 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: sending JOIN(ip-192-168-27-128) to ip-192-168-27-128
2018-01-14 17:28:55,490 WARN [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: JOIN(ip-192-168-27-128) sent to ip-192-168-27-128 timed out (after 30000 ms), on try 3
2018-01-14 17:28:55,490 WARN [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: JOIN(ip-192-168-27-128) sent to ip-192-168-27-128 timed out (after 30000 ms), on try 3
2018-01-14 17:28:55,521 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: discovery took 30 ms, members: 411 rsps (2 coords) [done]
2018-01-14 17:28:55,521 TRACE [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: discovery took 30 ms, members: 411 rsps (2 coords) [done]
2018-01-14 17:28:55,521 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: found multiple coords: [ip-192-168-33-243, ip-192-168-27-128]
2018-01-14 17:28:55,521 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: found multiple coords: [ip-192-168-33-243, ip-192-168-27-128]
2018-01-14 17:28:55,521 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: sending JOIN(ip-192-168-27-128) to ip-192-168-27-128
2018-01-14 17:28:55,521 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) ip-192-168-27-128: sending JOIN(ip-192-168-27-128) to ip-192-168-27-128
Excerpt from standalone-ha.xml Keycloak file:
<subsystem xmlns="urn:jboss:domain:jgroups:5.0">
<channels default="ee">
<channel name="ee" stack="tcp"/>
</channels>
<stacks>
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp"/>
<protocol type="JDBC_PING">
<property name="datasource_jndi_name">java:jboss/datasources/KeycloakCluster</property>
<property name="initialize_sql">
CREATE TABLE IF NOT EXISTS JGROUPSPING (
own_addr varchar(200) NOT NULL,
cluster_name varchar(200) NOT NULL,
updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ping_data varbinary(5000) DEFAULT NULL,
PRIMARY KEY (own_addr, cluster_name))
ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
</property>
</protocol>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS">
<property name="join_timeout">30000</property>
</protocol>
<protocol type="MFC"/>
<protocol type="FRAG2"/>
</stack>
</stacks>
</subsystem>
Solution 1:[1]
Check your socket-binding for jgroups-tcp is specifying the private interface. It should be something like
<socket-binding name="jgroups-tcp" interface="private" port="7601"/>
Then check that a connection can be established from the other instance to this using a telnet client. If not then check the EC2 security Group rules and ACLs allow the connection between the subnets in the availability zones.
Secondly... is the lack of a socket binding on FD_SOCK intentional? There may need to be a separate tcp port binding defined for FD_SOCK and this also needs to be in the EC2 security group and ACL rules.
Solution 2:[2]
the timeouts look like you may not have opened the needed ports - 7600, 57600 in your security group(s) on your ec2 instances.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | denov |
