'How to avoid publishing duplicate data to Kafka via Kafka Connect and Couchbase Eventing, when replicate Couchbase data on multi data center with XDCR

My buckets are:

  • MyDataBucket: application saves its data on this bucket.
  • MyEventingBucket: A couchbase eventing function extracts the 'currentState' field from MyDataBucket and saves it in this bucket.

Also, I have a kafka couchbase connector that pushs data from MyEventingBucket to kafka topic.

When we had a single data center, there wasn't any problem. Now, we have three data centers. We replicate our data with XDCR between data centers and we work as active-active. So, write requests can be from any data center.

When data is replicated on other data centers, the eventing service works on all data centers, and the same data is pushed three-time (because we have three data centers) on Kafka with Kafka connector.

How can we avoid pushing duplicate data o Kafka?

Ps: Of course, we can run an eventing service or Kafka connector in only one data center. So, we can publish data on Kafka just once. But this is not a good solution. Because we will be affected when a problem occurs in this data center. This was the main reason of using multi data center.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source