'Is there a way to tell which event occurred first in two kafka topics
If I have two topics in kafka, is there a way to tell if one event in one topic "occured" before an event in another topic if they both come in within a millisecond of each other ie they have the same timestamp?
Background: I am building an event sourcing based event drive architecture. Often, when an event occurs in one topic, I need to do a scan to find if a separate event has already occurred in a second topic. Likewise, if the event in the second topic comes in, I need to scan to see if the event in topic one occurred.
In order to not duplicate processing, I need a deterministic way to order the events. If the events are more than 1 millisecond apart, I can just use the timestamp in the event. But, because kafka timestamps only go to the millisecond, when two events occur close together, I can no longer use this approach.
In reality, I don't care which topic occured "first", ie if kafka posted one before another, even if they came in a different order, I don't care. I just need a deterministic way to order them.
In reality, I can use some method, such as arranging the events by topic alphabetically, but was hoping there was a built-in mechanism. (don't want to introduce weird bugs because I always process event A before event B; unlikely, but I've seen it happen)
PS I am open to other ideas. I'm thinking this approach because it was possible in redis streams. However, because of things I can't control, I am restricted to kafka. I do want to avoid using an external data store as then I need to start worrying about data synchronization in there.
Solution 1:[1]
You're going to run into synchronization issues, regardless. For example - you could try using a stream-topic join in Kafka Streams. If the event doesn't exist for the join, then it hasn't happened yet, but then you're reliant on having absolutely zero lag in the consumer processes building that KTable.
You could try storing nanoseconds as part of the value or header when you create the record if you need higher precision, but again, you're going to either need absolute zero lag or very precise consumer poll events with some comparison window as Kafka does not provide any processing guarantees across multiple topics
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
