'MongoDB Replicaset - How to solve "Could not find self in current config" on Secondary node
I am running a MongoDB server running V4.4.14 installed on VMs configured as follows:
- 1 Primary
- 2 Secondary
- 1 Arbiter
Nodes are not on the same local network and public hostnames configured on DNSs are used.
After I am trying to add another (delayed) secondary but I get the following error:
ReplicaSet Status:
rs.status().members
{
"_id" : 5,
"name" : "xxxx-4.xxx.xx:27123",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
...
"pingMs" : NumberLong(31),
"lastHeartbeatMessage" : "Our replica set configuration is invalid or does not include us",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : -1,
"configTerm" : -1
}
MongoDB Logs on the Primary
{"t":{"$date":"2022-05-14T23:20:34.328+02:00"},"s":"I", "c":"REPL_HB", "id":23974, "ctx":"ReplCoord-16984","msg":"Heartbeat failed after max retries","attr":{"target":"xxxx-4.xxx.xx:27123","maxHeartbeatRetries":2,"error":{"code":93,"codeName":"InvalidReplicaSetConfig","errmsg":"Our replica set configuration is invalid or does not include us"}}}
MongoDB Logs on the Secondary
{"t":{"$date":"2022-05-14T21:21:58.661+00:00"},"s":"I", "c":"REPL", "id":3564900, "ctx":"ReplCoord-94","msg":"Could not find self in current config, retrying DNS resolution of members","attr":{"target":"xxxx-2.xxx.xx:27123","currentConfig":{"_id":"rs0","version":133097,"protocolVersion":1,"writeConcernMajorityJournalDefault":true,"members":[{"_id":1,"host":"xxxx-1.xxx.xx:27123","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":0.6,"tags":{},"slaveDelay":0,"votes":1},{"_id":2,"host":"xxxx-2.xxx.xx:27123","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":0.1,"tags":{},"slaveDelay":0,"votes":1},{"_id":3,"host":"xxxx-3.xxx.xx:27123","arbiterOnly":true,"buildIndexes":true,"hidden":false,"priority":0.0,"tags":{},"slaveDelay":0,"votes":1},{"_id":4,"host":"xxxx-6.xxx.xx:27123","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":0.3,"tags":{},"slaveDelay":0,"votes":1},{"_id":5,"host":"xxxx-4.xxx.xx:27123","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":0.0,"tags":{},"slaveDelay":3600,"votes":0}],"settings":{"chainingAllowed":false,"heartbeatIntervalMillis":3000,"heartbeatTimeoutSecs":15,"electionTimeoutMillis":10000,"catchUpTimeoutMillis":-1,"catchUpTakeoverDelayMillis":30000,"getLastErrorModes":{},"getLastErrorDefaults":{"w":1,"wtimeout":0},"replicaSetId":{"$oid":"5df4f4f01223ca52c6ab5ebe"}}}}}
MongoDB Configuration
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 133097,
"protocolVersion" : NumberLong(1),
"writeConcernMajorityJournalDefault" : true,
"members" : [
{
"_id" : 1,
"host" : "xxxx-1.xxx.xx:27123",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0.6,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "xxxx-2.xxx.xx:27123",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0.1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 3,
"host" : "xxxx-3.xxx.xx:27123",
"arbiterOnly" : true,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 4,
"host" : "xxxx-6.xxx.xx:27123",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0.3,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 5,
"host" : "xxxx-4.xxx.xx:27123",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(3600),
"votes" : 0
}
],
"settings" : {
"chainingAllowed" : false,
"heartbeatIntervalMillis" : 3000,
"heartbeatTimeoutSecs" : 15,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : -1,
"catchUpTakeoverDelayMillis" : 30000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("5df4f4f01223ca52c6ab5ebe")
}
}
I can see the error is handled on this file:
https://github.com/mongodb/mongo/blob/master/src/mongo/db/repl/topology_coordinator.cpp
and probably means that the secondary doesn't recognise its hostname as being 'self', hence its reporting 'not part of the config', however
- I can traceroute from the secondary and the route comes back to self
- I can connect via mongo shell from the secondary to all other nodes and from all other nodes into the secondary
- I can ssh from outside into the secondary node
- I CANNOT mongo shell or ssh using the public DNS hostname from within the secondary node
This node used to be part of the configuration without issues, was cleanly removed and now this issue has appeared
Any pointers to what can be the issue?
Solution 1:[1]
This had me wandering around for about one month.
However, as I was writing the question on this forum, that last point had me thinking.
I am still not sure why I couldn't mongo shell/ssh using the public hostname however, adding the hostname to the /etc/hosts file and pointing to the local IP (by overriding the DNS response) solved the issue.
That is: The issue was exactly what was written on the tin - the replica couldn't connect to itself by using the public hostname.
The dirty solution: I added the public hostname to the /etc/hosts file pointing to the local IP. This solved the issue.
I still need to figure out why I cannot connect via the public hostname but that is a different issue at this point.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ylli Prifti |
