'Model relations either by object using ID as key or by array of objects

The example below shows two possible document structures to be used for a contact in a contacts collection on MongoDB 3.4. Note the relationship between the contact and the campaigns where he belong to.

Approach A: campaigns is an object which holds campaigns as a key:value pair where key is the campaign ID and value the other campaign data.

{
  "first_name": "John", 
  "last_name": "Doe", 
  "user_id": 1170,
  "campaigns": {
    3452: {
      subscription_dt: ISODate("2017-01-28T19:00:00Z"),
      score: 19
    },
    243: {
      subscription_dt: ISODate("2017-01-15T16:45:00Z"),
      score: 27
    }
  }
}

Approach B: campaigns is an array which simply holds campaigns as objects.

{
  "first_name": "John", 
  "last_name": "Doe", 
  "user_id": 1170,
  "campaigns": [
    {
      campaign_id: 3452,
      subscription_dt: ISODate("2017-01-28T19:00:00Z"),
      score: 19
    },
    {
      campaign_id: 243,
      subscription_dt: ISODate("2017-01-15T16:45:00Z"),
      score: 27
    }
  ]
}

Please imagine any kind of query on the collection so:

  • Which is the best approach for queries?
  • Is there any particular query which is harder to write using some of the solutions? even impossible to write?
  • Which is the best approach for better performance? (I mean for example, the creation of a compound index user_id, campaign_id)

For the purpose of the analysis assume that the relationship must be placed in the contact document.



Solution 1:[1]

I would have chosen approach B, it is the common use. It is a good approach for queries to get data in campaigns array.

You can create index on campaign_id and use it to get better performance on queries. And also you can create a multikey index with user_id and campaign_id (compound) as an answer to your question. A disadvantage about multikey indexes is multikey indexes need more storage than other indexes. But it provides you to query from arrays with high performance.

In approach A, to query the data with campaign_id, you have to create index for each campaign_id and that is nonsense (I am not sure if someone uses that approach but I would not do that). New campaign_id is going to force you to create a new index with new campaign_id to get better performance in queries. Maybe a better answer might be given for approach A but my experience on MongoDB tells me to choice approach B for that question.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Hasan Alper Ocalan