'How should I handle delete events with MongoDB changeStreams?
TL;DR - How can I improve the handling of delete events with changeStreams and make sure they're only emitted to the relevant users?
Currently, I'm using MongoDB changeStreams to push real-time updates to my React app. While insert and update operations go smoothly, detecting delete events unique to a user's domain is somewhat of a hassle, and I am looking for means to improve my current solution or create a new one around opening a change stream for each user that connects to the app.
Current Setup: All collections, user-independent
My current deployment method works well, and I believe it is scalable. Once the backend server starts, it initializes changeStreams on all of the necessary models to watch. My data architecture is separated by a user's clientID and affiliateID as reference IDs (clientID != affiliateID, two independent IDs).
MongoDB Collections:
Model A (watched)
Model B (watched)
Model C (watched)
Clients
Affiliates
Users
- User A
* clientID
* affiliateID
I have changeStreams options set to updateLookup, which when it detects a change event in one of the watched models, it sends the entire document in the changeStream as data.fullDocument. This is useful, as I can grab the changed document's clientID and affiliateID, and emit the change only to the relevant users through socket.io. However, this method is sub-optimal when delete events occur in the watched collections, as updateLookup cannot return the full document to the change stream, preventing me from grabbing the clientID and affiliateID and emitting the change to only the affected users.
Delete events only return the data.documentKey from changeStreams, an object of the following structure:
documentKey : {
_id: ObjectID, //ID of the document deleted
shardKey1: ..., //documentKey only contains shard keys for sharded MongoDB clusters
shardKey2: ..., //otherwise, documentKey only contains _id
//...
shardKeyN: ...
}
While I do intend to shard my cluster, sharding it based on clientID and affiliateID are poor choices I believe due to the sheer number of shards it would create. Additionally, despite new clients and affiliates seldom being added, the shard key would be monotonically increasing (please correct me here if this ideology is wrong).
The current work-around is to emit the delete event and the document's ID to all connected users, regardless of client and affiliate, and search their information if a document with a matching _id exists. This poses some issues:
- Once the cluster is sharded, the document's
_idfield is no longer guaranteed to be unique. As a result, if I continue with the current workaround, there is a possibility that User A's data is deleted from their screen (not MongoDB) in the app when unaffiliated User B deletes some data. - Once the cluster is sharded, the shard keys still offer no value to the
changeStreamor the means of emitting those events to the relevant users with socket.io.
How can I improve the handling of delete events with changeStreams and make sure they're only emitted to the relevant users?
My original theory was to open a changeStream on each of the collections for each user that connects, that way I could pass it the clientID and affiliateID in the API function call. However, changeStreams can only be opened on a database or a collection as a whole, not only on those that meet certain conditions (i.e. matching the reference IDs). Second, I think this would incur too much of a performance cost on the backend, and does not scale well with additional affiliates and clients.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
