'Mongo Kafka Connector Collection Listen Limitations

We have several collections in Mongo based on n tenants and want the kafka connector to only watch for specific collections.

Below is my mongosource.properties file where I have added the pipeline filter to listen only to specific collections.It works

pipeline=[{$match:{“ns.coll”:{"$in":[“ecom-tesla-cms-instance”,“ca-tesla-cms-instance”,“ecom-tesla-cms-page”,“ca-tesla-cms-page”]}}}]

the collections will grow in the future to maybe 200 collections which have to be watched, wanted to know the below three things

  1. is there some performance impact with one connector listening to huge number of collections ?
  2. is there any limit on the collections one connector can watch ?
  3. what would be the best practice, to run one connector listening to 100 collections or 10 different connectors listening to 10 collections each ?


Solution 1:[1]

Best practice would say to run many connectors, where "many" depends on your ability to maintain the overhead of them all.

Reason being - one connector creates a single point of failure (per task, but only one task should be assigned to any collection at a time, to prevent duplicates). If the Connect task fails with a non-retryable error, then that will halt the connector's tasks completely, and stop reading from all collections assigned to that connector.

You could also try Debezium, which might have less resource usage than the Mongo Source Connector since it acts as a replica rather than querying the collection at an interval.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 OneCricketeer