'How to handle the messages in Kafka producer if the retries exhausted

We have to implement the User registration module, We have two services

  1. Identity service
  2. UserRegistration service

Now when the User is registered from the UserRegistration service we want to send user details to the Identity service so the user can login to the system. We are using Kafka to achieve this.

In our cases UserRegistration service act as a Kafka producer. And the flow would be as follows:

  1. The request comes for registering the user.
  2. Store the User Data in the Database.
  3. Send a message to Kafka.

Cases:

  1. If the producer's request succeeded, i.e message store it in the topic.

    • Return Success message to End-User.
  2. If the producer's request fails,

    • Retry by sending the message until the retries limit.
  3. If the retries are exhausted.

    - What to do in this situation? A message could be lost in this case.

I think we should store the message in the Database (ie. in table "failed-messages") if the retries are exhausted and have a background service that will Loop over the failed messages to try sending them again to Kafka after every time interval.

Please suggest to us the best practices to handle this case.

Thanks, Saurabh.



Solution 1:[1]

If retries are exhausted...

The retries are typically set to Integer.MAX_VALUE in KafkaProduer with a limit set on the time rather than the no. of retries.

Please suggest to us the best practices to handle this case.

Approach #1

When you store the data in the database, you can use a Kafka source connector like Debezium (for example) to stream the database changes to Kafka so that you can avoid writing a producer in your UserRegistration service. This would be a cleaner approach IMO because of several reasons:

  1. You need not let your user wait till the kafka message is published and acknowledged by the broker
  2. You need not worry about the retrying logic in your UserRegistration service.
  3. Even if Kafka is down for a while, your user should not be impacted.

Approach #2

An alternative approach would be to delegate the producing task to another thread that runs periodically.

The work of this thread is to check for any updates to the database and push those updates to Kafka. Should anything fail, this thread is responsible for retrying and ensuring that the message lands in Kafka.

One thing to mention here about this approach is that, if you have multiple instances of your UserRegistration service running, you need to ensure that you distribute the records-to-be-sent amongst different instances so that you don't end up sending duplicates to Kafka. This becomes a stateful service because you need to co-ordinate amongst multiple instances. It is relatively difficult to implement this.


Approach #3

If your code has already been written where your UserRegistration service is using KafkaProducer which is producing the records in the request handling thread itself, you can try increasing the timeout to a larger value (delivery.timeout.ms) depending upon your Kafka cluster and leave the retries to Integer.MAX_VALUE. Remember that you would need to ensure that your message is delivered somehow. There are 2 ways:

  1. Async: Trust your timeout value and retries by setting to larger value. At some point you should hope that your message will be sent to Kafka. Be extra cautious about other errors like serialization issues, buffer memory limit etc.
  2. Sync: Call the get() (or) get(time, TimeUnit) and block your request thread till it is sent to Kafka. This delays the response to your client.

If I had to choose, I would go with Approach #1 because it is clean and straightforward.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 JavaTechnical