'How to remove duplicate objects with different parameters in an aggregation in mongo db

[
   { id:1,month:5,year:2020,text:"Completed" },
   { id:2,month:2,year:2021,text:"Pending" },
   { id:3,month:3,year:2020,text:"Completed" },
   { id:4,month:5,year:2020,text:"Pending" },
   { id:5,month:4,year:2022,text:"Pending" },
]

These are the documents in my collection. I need to remove remove the duplicate objects with same year & month using aggregation in mongo db. so that i get

[
   { id:1,month:5,year:2020,text:"Completed" },
   { id:2,month:2,year:2021,text:"Pending" },
   { id:3,month:3,year:2020,text:"Completed" },
   { id:5,month:4,year:2022,text:"Pending" },
]


Solution 1:[1]

Maybe something like this:

db.collection.aggregate([
{
  $group: {
    _id: {
      month: "$month",
      year: "$year"
   },
    cnt: {
      $sum: 1
    },
    doc: {
      $push: "$$ROOT"
    }
  }
},
{
  $match: {
    cnt: {
      $gt: 1
    }
  }
},
{
  $project: {
    docsTodelete: {
      $slice: [
        "$doc",
        1,
        {
          "$size": "$doc"
        }
      ]
    }
  }
},
{
  $unwind: "$docsTodelete"
}
]).forEach(function(doc){ 
 db.backup.save(doc.docsTodelete);
 db.collection.remove(_id:doc.docsToDelete._id)  
})

explained:

  1. Group the documents by month-year and push the originals to array doc
  2. Match only the documents that have duplicates
  3. Slice the documents array to leave 1x document in the collection
  4. Unwind the array with documents to be removed
  5. Do forEach loop to remove the duplicated documents from the collection and store the removed in backup collection just in case you have doubts later.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1