'Akka.NET cluster intermittent dead letters

We have our cluster running locally (for now) and everything seems to be configured correctly. Our prime calculation messages are distributed over our seednodes. However, we are intermittently losing messages. You can see the behaviour of two runs in the screenshot. Which messages are marked as dead letters isn't consistent at all.

Our messages are always sent the same way, they look like this. The last parameter means the nth prime to find.

new PrimeCalculationEntry(id, 1, 100000),
new PrimeCalculationEntry(id, 2, 150000),
new PrimeCalculationEntry(id, 3, 200000),
new PrimeCalculationEntry(id, 4, 250000),
new PrimeCalculationEntry(id, 5, 300000),
new PrimeCalculationEntry(id, 6, 350000),
new PrimeCalculationEntry(id, 7, 400000),
new PrimeCalculationEntry(id, 8, 450000)

Our cluster is set up like this: One non-seednode which is a group router and sends messages to two seednodes, which are configured as pool routers.

Non seednode: localhost:0 (random port)

akka {
            actor {
                provider = cluster
                deployment {
                    /commander {
                        router = round-robin-group # routing strategy
                        routees.paths = ["/user/cluster"] # path of routee on each node
                        cluster {
                            enabled = on
                            allow-local-routees = on
                        }
                    }
                }
            }
            remote {
                dot-netty.tcp {
                    port = 0 #let os pick random port
                    hostname = localhost
                }
            }
            cluster {
                seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081", "akka.tcp://ClusterSystem@localhost:8082"]
            }
        }

Seednode 1: localhost:8081 (leader)

akka {
            actor {
                provider = cluster
                deployment {
                    /cluster {
                        router = round-robin-pool
                        nr-of-instances = 10
                    }
                }
            }
            remote {
                dot-netty.tcp {
                    port = 8081
                    hostname = localhost
                }
            }
            cluster {
                seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
            }
        }

Seednode 2: localhost:8082

akka {
            actor {
                provider = cluster
                deployment {
                    /cluster {
                        router = round-robin-pool
                        nr-of-instances = 10
                    }
                }
            }
            remote {
                dot-netty.tcp {
                    port = 8082
                    hostname = localhost
                }
            }
            cluster {
                seed-nodes = ["akka.tcp://ClusterSystem@localhost:8081"]
            }
        }

Can anyone point us in the right direction? Any issues with our configuration? Thank you in advance.

Solution 1:^[1]

I think I know what the issue is here - you don't have any akka.cluster.roles defined nor is your /commander router configured with the use-role setting - so as a result, every Nth message is being dropped because it's trying to route a message to itself and does not have a /user/cluster actor present to receive it.

To fix this properly, we should do the following:

Have all nodes that can process the PrimeCalculationEntry declare akka.cluster.roles=[prime]
Have the node with the /commander router change its HOCON to:

     /commander {
        router = round-robin-group # routing strategy
        routees.paths = ["/user/cluster"] # path of routee on each node
        cluster {
            enabled = on
            allow-local-routees = on
            use-role = "prime"
        }
    }

This will eliminate the deadletters as the /commander node will no longer be sending messages to itself every N iterations.

Solution 2:^[2]

I saw the answer from @Aaronontheweb too late. We "fixed" it by setting allow-local-routees to off on the commandor HOCON. But I guess a better solution would be to set roles correctly as mentioned in the answer.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Aaronontheweb
Solution 2	Stephan Bisschop

'Akka.NET cluster intermittent dead letters

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]