'Best way to structure deep time-series data in MongoDB?

I'm migrated from MySQL to MongoDB and looking to optimize the storage of my time-series data.

Each item in my database has an array of values for multiple locales. So the full data at a single point in time might look like this:

{ 
  productId: 1,
  locale: 'US',
  votes: [0,2,5],
  date: ISODate('2022-01-01T00:00:00Z')
}

There would be a similar record, with the same date, for multiple locales and each with its own votes array.

The records would be no more frequent than hourly, but probably only every six hours.

It could be a long time before the votes array gets a different value. So that could be several records every day that contain the same value with only different times.

And then multiply this by how many locales I track. Let's say it's US, CA, and AU.

I watched this presentation, so I understand that the series of records are stored as an array under a single document, based on the meta field.

But I have some questions about how to structure and store this as a MongoDB time series.

1. Should I store the locales more deeply in each product's record, thus looking like this?

[{ 
  productId: 1, // This would be the "meta" field
  us: [0,2,5],
  ca: [0,4,1],
  au: [1,0,1],
  timestamp: ISODate('2022-01-01T00:00:00Z')
}]

Or:

[{ 
  productId: 1, // This would be the "meta" field
  locales: {
    us: [0,2,5],
    ca: [0,4,1],
    au: [1,0,1],
  },
  timestamp: ISODate('2022-01-01T00:00:00Z')
}]

Or would it be better to separate each locale as part of the meta, like this?

[{ 
  meta: { productId: 1, locale: 'US' },
  votes: [0,2,5],
  timestamp: ISODate('2022-01-01T00:00:00Z')
},
{ 
  meta: { productId: 1, locale: 'CA' },
  votes: [0,4,1],
  timestamp: ISODate('2022-01-01T00:00:00Z')
},
{ 
  meta: { productId: 1, locale: 'AU' },
  votes: [1,0,1],
  timestamp: ISODate('2022-01-01T00:00:00Z')
}]

2. Is there a good practice to optimize the repeated data, or does MongoDB time series do that already?

So if I have 100 records in a row that all that the same votes: [0,2,5], will MongoDB optimize that? Or what if it's an object that contains multiple unchanged locales but one of them does change while all the others stay the same?

Or should I make my system save a new record only if the votes array has changed since the last record?

And I see that just today, MongoDB 5.3 can fill in gaps.

So maybe I should leave gaps when I would otherwise have repeated data?

Thanks in advance!



Solution 1:[1]

On consideration of this, I've decided that the best way to do this is to give each locale its own record. This seems the most optimal use of storage space by preventing redundancies, and also would give the fastest querying performance by returning only the necessary locale.

In addition to this, I will add a "global" locale to contain the totals of all locales. This allows an overview to be processed with a single query instead of having to combine hundreds of locales every time the overview is needed.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Daniel J. Lewis