'Akka.NET cluster intermittent dead letters
We have our cluster running locally (for now) and everything seems to be configured correctly. Our prime calculation messages are distributed over our seednodes. However, we are intermittently losing messages. You can see the behaviour of two runs in the screenshot. Which messages are marked as dead letters isn't consistent at all.
Our messages are always sent the same way, they look like this. The last parameter means the nth prime to find.
new PrimeCalculationEntry(id, 1, 100000),
new PrimeCalculationEntry(id, 2, 150000),
new PrimeCalculationEntry(id, 3, 200000),
new PrimeCalculationEntry(id, 4, 250000),
new PrimeCalculationEntry(id, 5, 300000),
new PrimeCalculationEntry(id, 6, 350000),
new PrimeCalculationEntry(id, 7, 400000),
new PrimeCalculationEntry(id, 8, 450000)
Our cluster is set up like this: One non-seednode which is a group router and sends messages to two seednodes, which are configured as pool routers.
Non seednode: localhost:0 (random port)
akka {
actor {
provider = cluster
deployment {
/commander {
router = round-robin-group # routing strategy
routees.paths = ["/user/cluster"] # path of routee on each node
cluster {
enabled = on
allow-local-routees = on
}
}
}
}
remote {
dot-netty.tcp {
port = 0 #let os pick random port
hostname = localhost
}
}
cluster {
seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081", "akka.tcp://ClusterSystem@localhost:8082"]
}
}
Seednode 1: localhost:8081 (leader)
akka {
actor {
provider = cluster
deployment {
/cluster {
router = round-robin-pool
nr-of-instances = 10
}
}
}
remote {
dot-netty.tcp {
port = 8081
hostname = localhost
}
}
cluster {
seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
}
}
Seednode 2: localhost:8082
akka {
actor {
provider = cluster
deployment {
/cluster {
router = round-robin-pool
nr-of-instances = 10
}
}
}
remote {
dot-netty.tcp {
port = 8082
hostname = localhost
}
}
cluster {
seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
}
}
Can anyone point us in the right direction? Any issues with our configuration? Thank you in advance.
Solution 1:[1]
I think I know what the issue is here - you don't have any akka.cluster.roles defined nor is your /commander router configured with the use-role setting - so as a result, every Nth message is being dropped because it's trying to route a message to itself and does not have a /user/cluster actor present to receive it.
To fix this properly, we should do the following:
- Have all nodes that can process the
PrimeCalculationEntrydeclareakka.cluster.roles=[prime] - Have the node with the
/commanderrouter change its HOCON to:
/commander {
router = round-robin-group # routing strategy
routees.paths = ["/user/cluster"] # path of routee on each node
cluster {
enabled = on
allow-local-routees = on
use-role = "prime"
}
}
This will eliminate the deadletters as the /commander node will no longer be sending messages to itself every N iterations.
Solution 2:[2]
I saw the answer from @Aaronontheweb too late. We "fixed" it by setting allow-local-routees to off on the commandor HOCON. But I guess a better solution would be to set roles correctly as mentioned in the answer.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Aaronontheweb |
| Solution 2 | Stephan Bisschop |

