'Ksql , streams , tables and multiple microservice instances

i'm new to ksql but knows some kafka concepts very well (partitions,consumer group etc)

we want to build a micro service that will listen to events and every 10 minutes will trigger some external computation based on the latest data (example sagemaker model inference job) so for example we have these events in the topic

source time data
source1 2022-01-01 10:00:00 20
source2 2022-01-01 10:00:00 30
source1 2022-01-01 10:05:00 100
source2 2022-01-01 10:05:00 10

and we want to trigger jobs for the latest value from source1=100 and source2=10 - after 10 minutes these should be deleted. (so it will act as sort of a queue for 10 minutes) another important thing is that we don't want to launch computation for every row in the table but for all of them together.

This at first sounds like a Ktable which have only last value per source. so i started writing some code that create the table and reads it , but when i opened 2 ksql CLI (simulate 2 instances of the MS) i understand that both see the same data meaning they will both trigger compute jobs, and obviously this is not what i want. i read a lot about the architecture and that under the hood kafka-streams use consumer groups , but didn't find much info about the client here (java) or any example for that matter that explain how your stream application behave in cluster of MS. any direction would be appreciated.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source