'MongoDB Best Practice: updateMany or reference another collection

I have many collections with inter-related documents. Many times I reference the "_id" of a document in another collection then perform an aggregation pipeline with the $lookup operator to stich the records together.

This is thanks to my previous experience as an SQL developer. Obviously this is a working solution for me, but it feels wrong; like I'm offending the NoSQL gods.

The other solution would be to take the document from the referenced collection and inject it into the working collection's document, making a copy of the data from the referenced document in the primary document.

However, that means I'll need to perform an updateMany when ever the referenced data changes.

So, the religious question of the hour: is it better to use references and aggregation pipelines, or copy the data into the 'primary' documents and use updateMany on changes?

Example

Let's use an inventory system and promote Robert Johnson from Network Guy to Network Manager

Using References

devices

[
  {
    "_id": "1234",
    "name": "bob-device",
    "address": "10.1.2.3",
    "category": "computer",
    "type": "laptop",
    "owner": "u89823"
  },
  {
    "_id": "1235",
    "name": "switch-01",
    "address": "172.16.0.14",
    "category": "network",
    "type": "switch",
    "owner": "u89823"
  }
]

users

[
  {
    "_id": "u89823",
    "name": {
      "given": "Robert",
      "surname": "Johnson"
    },
    "title": "Network Guy"
  }
]

Queries

Device

db.devices.aggregate([
  {"$match": {"_id": "1234"}},
  {"$lookup": {
    from: 'users',
    localField: 'owner',
    foreignField: '_id',
    as: 'owner'
  }},
])

// Returns
{
  "_id": "1234",
  "name": "bob-device",
  "address": "10.1.2.3",
  "category": "computer",
  "type": "laptop",
  "owner": {
    "_id": "u89823",
    "name": {
      "given": "Robert",
      "surname": "Johnson"
    },
    "title": "Network Guy",
  }
}

Update Owner

db.owners.updateOne(
  {
    "_id": "u89823"
  }, {
    "$set": {
      "title": "Network Manager"
    }
  }
)

Using embedded documents

devices

[
  {
    "_id": "1234",
    "name": "bob-device",
    "address": "10.1.2.3",
    "category": "computer",
    "type": "laptop",
    "owner": {
      "_id": "u89823",
      "name": {
        "given": "Robert",
        "surname": "Johnson"
      },
      "title": "Network Guy"
    }
  },
  {
    "_id": "1235",
    "name": "switch-01",
    "address": "172.16.0.14",
    "category": "network",
    "type": "switch",
    "owner": {
      "_id": "u89823",
      "name": {
        "given": "Robert",
        "surname": "Johnson"
      },
      "title": "Network Guy"
    }
  }
]

users

[
  {
    "_id": "u89823",
    "name": {
      "given": "Robert",
      "surname": "Johnson"
    },
    "title": "Network Guy"
  }
]

Queries

Device

db.devices.findOne(
  {"_id": "1234"}
)

// Returns
{
  "_id": "1234",
  "name": "bob-device",
  "address": "10.1.2.3",
  "category": "computer",
  "type": "laptop",
  "owner": {
    "_id": "u89823",
    "name": {
      "given": "Robert",
      "surname": "Johnson"
    },
    "title": "Network Guy",
  }
}

Update Owner

db.users.updateOne(
  {
    "_id": "u89823"
  }, {
    "$set": {
      "title": "Network Manager"
    }
  }
);
db.devices.updateMany(
  {
    "owner._id": "u89823"
  }, {
    "$set": {
      "owner.title": "Network Manager"
    }
  }
);

So, the question is: which method is best?

The first method (using references similar to a normalized SQL Database) offers a simpler way to update records.

The second method (using embedded documents) offers a simpler query when pulling data out, but requires multiple updates when changing the shared data.

Before anyone asks why use a "users" collection at all, it's because there may be more data that you want to include about a user account that does not need to be present in a "device" document, stuff like group membership or start / end dates.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source