'Find duplicate and keep the record with latest field value in MongoDB
I am trying to find duplicates and delete old documents and keep the latest document only on the basis of a field value in mongoDB.
Below is how my collection looks like. I would like to find duplicates on the basis of guid and then only keep the document which has the latest revision number.
{
"_id": {
"$oid": "1201f1196affd5c74ca0af14"
},
"guid": "MEM-bce9",
"revision": 121,
"action": "updated"
}
{
"_id": {
"$oid": "2201f1196affd5c74ca0afc4"
},
"guid": "MEM-bce9",
"revision": 122,
"action": "deleted"
}
{
"_id": {
"$oid": "3201f1196affd5c74ca0afc4"
},
"guid": "MEM-aXt1",
"revision": 21,
"action": "created"
}
{
"_id": {
"$oid": "4201f1196affd5c74ca0afc4"
},
"guid": "MEM-aXt1",
"revision": 22,
"action": "updated"
}
{
"_id": {
"$oid": "5201f1196affd5c74ca0afc4"
},
"guid": "MEM-Mwq0",
"revision": 121,
"action": "updated"
}
Expected Output
{
"_id": {
"$oid": "2201f1196affd5c74ca0afc4"
},
"guid": "MEM-bce9",
"revision": 122,
"action": "deleted"
}
{
"_id": {
"$oid": "4201f1196affd5c74ca0afc4"
},
"guid": "MEM-aXt1",
"revision": 22,
"action": "updated"
}
{
"_id": {
"$oid": "5201f1196affd5c74ca0afc4"
},
"guid": "MEM-Mwq0",
"revision": 121,
"action": "updated"
}
Solution 1:[1]
Let's take the simplest case In your collection, the _id (as a timestamp) for item with revision x is before revision x+1. In this situation, the natural sort order makes this query work exactly as you need.
[{$group: {
_id: '$guid',
items: {
$push: '$$ROOT'
}
}}, {$addFields: {
lastRevision: {
$last: '$items'
}
}}]
Let's go through this: You are creating an object where each record in the collection that shares the same guide are arranged as an array. Since the most recent revision is at a greater index than previous revisions, you just pick of the last item of the array.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ilan Toren |
