'MongoDB Aggregation Limit Lookup

I am using $lookup in PyMongo to successfully "join" two collections (this works). I am having a problem where the second collection I am joining in may exceed the BSON document size when it returns all of the records.

I am looking to use $limit to limit the number of records that are allowed to join under "match_docs" eg: 100 records maximum from "comments" per obj_id:

db.indicators.aggregate([
  {
    "$lookup": {
      "from": "comments",
      "localField": "_id",
      "foreignField": "obj_id",
      "as": "match_docs"
    }
  }
])

I've tried various types of $limit, and it seems to only limit the total number of results overall, not just for the join.



Solution 1:[1]

Starting from MongoDB 3.6 you can use uncorrelated subqueries to limit the lookup:

db.indicators.aggregate([
{ $lookup: {
  from: 'comments',
  as: 'match_docs',
  let: { indicator_id: '$_id' },
  pipeline: [
    { $match: {
      $expr: { $eq: [ '$obj_id', '$$indicator_id' ] }
    } },
    // { $sort: { createdAt: 1 } }, // add sort if needed (for example, if you want first 100 comments by creation date)
    { $limit: 100 }
  ]
} }
])

Solution 2:[2]

If you do a $unwind immediately following a $lookup, the pipeline will be optimized, basically combining the 2 stages helping to bypass the 16MB limit that could result from the $lookup returning a large number of documents.

Keep in mind, if a single document in the foreign collection plus the size of the document in the local collection exceed 16 MB, this optimization cannot help.

Solution 3:[3]

I was able to figure it out.

$lookup -> $match -> $project

db.indicators.aggregate([{
    "$lookup": {
        "from": "comments"
        , "localField": "_id"
        , "foreignField": "obj_id"
        , "as": "match_docs"
    }
}, {
    "$match": {
        "match_docs": {
            "$exists": True
        }
    }
}, {
    "$project": {
        "match_docs_agg": {
            "$slice": ["$match_docs", 3]
        }
    }
}])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dan Karbayev
Solution 2 Pete Garafano
Solution 3 gleb1783