'Kafka Topic - filter and dispatch messages

Background

Our software solution collects data ("events") per customer.
Some of the customers (a small fraction ~3%) ask to get this data into "their systems" (they need to pay for this service).
A target system where we need to send those events might be:

AWS S3
Azure Storage
Splunk
DataDog
More target systems to come in the future..

All target systems above have well known Kafka Connect Sink connectors so the idea is to use those connectors in order to export the data.

Possible Solution

All customer events goes to one "input" topic
Custom software consumes messages from the Kafka "input" topic
The software looks at the message attributes and based on one of the attributes value (lets call it customer_id) decide if the message should be dropped or published to anther Kafka topic named '<customer_id>_topic'.

The destination topic will probably be part of a different cluster. I understand this can be easily done using Kafka Streams.

Note that I am aware of the thread Disperse messages in Kafka stream

My question is - can it be done using Kafka Connect and SMT?
I am looking for a "managed" solution and since our Kafka run in AWS MSK I dont need to manage the Kafka Connect cluster. With Kafka Streams I will have to install my software on EC2 / ECS - isnt it

Solution 1:^[1]

The destination topic will probably be part of a different cluster. I understand this can be easily done using Kafka Streams

Kafka Streams can/should only write to the same cluster. It cannot guarantee delivery to others.

For sending data to other clusters, MirrorMaker would be a starting point.

As you might know, RegexRouter can rename the topic, but it cannot pull out dynamic fields from the record and rename the topic - you'd need to write your own transform for this.

You should be able to also use the Filter transform to inspect/drop events, but out-of-the-box this only will work on top-level fields, not nested ones.

Overall, I find having one topic name "per id" a bad design, assuming you might (eventually) have tens to thousands of ids.

Alternatively, managing tens to thousands of clusters "per customer" (or, at least sectioning off clusters with quotas "per client", although not sure how multi tenancy would work with duplicated topic names) might be difficult, too, but that's basically what MSK or Confluent Cloud are doing.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Kafka Topic - filter and dispatch messages

Background

Possible Solution

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]