'Does ksqldb 's custom udaf function guarantee concurrency(thread safe)?

I'm running 5 ksqldb instance(k8s), and each instance thread(ksql server properties) num is 3

I implemented the udaf function to aggregate a simple map object using the udaf function. Data corruption did not occur when more than 10,000 pieces of data per minute were aggregated through the udaf function in a cluster environment. My guess is that the udfa function seems to guarantee concurrency, am I right?

I have one more question I am currently running a ksqldb instance in the k8s environment. Will the table aggregate data of ksqldb work normally without loss even in the restart situation?



Solution 1:[1]

To answer your first question, ksqlDB creates an new instance of called UDAFs and uses them in a single-threaded manner; ksqlDB does not re-use UDAFs.

This means that if you as an implementator write a UDAF which does not use global state, then, "yes", your UDAF should be thread-safe.

For your second question, I believe the answer is "yes". UDAFs use the aggregate function to persist intermediate state to state stores; that ought to be restored when a ksqlDB node is restarted.

That said, technically, in either case, one could write a UDAF which fails to be thread-safe or does something really weird and would not restore properly.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 GeoJim