'NodeJS Optimizing Store Very Large Objects on Disk

Happy Friday :)

I'm working on storing very large arrays of objects (hundreds of thousands) to disk using NodeJS. It's working great on my insanely fast laptop, however, performance on slower devices is a primary consideration and I thought I'd let you guys take a stab at it to see if I'm missing something obvious or overlooking a better way.

Our ultimate goal is to be able to compare currentArray with previousArray so we need current and previous but only want to load the data as needed to keep ram usage down as much as possible when the app is idle.

We have to use NodeJS and can't use any local database, has to be stored as a regular file.

Here's what I've come up with, using Zlib:

let currentArray = [...]
let previousArray = []

if(fs.existsSync('last.db')) {
  previousArray = JSON.parse(zlib.gunzipSync(fs.readFileSync('last.db')).toString('utf-8'))
}

fs.writeFileSync('last.db', zlib.gzipSync(JSON.stringify(currentArray)))

Looking for thoughts, impressions, improvements, or coffee :D



Solution 1:[1]

Consider saving as csv instead of JSON string? You can then append each edited row to a .csv file, comma separated

Then you can read it as stream using fast csv

The pro of this is you do not need to read the entire db into a single variable, because writing/reading the entire file as string->object->string

  • parsing/stringify require cpu + memory
  • storing into variable requires writing everything into memory

In fact, I think if you were to use fsPromises.appendFile, you can also append JSON.stringify each individual object as separate line in the file instead of stringifying the entire array of objects,

You can then read the file line by line https://nodejs.org/api/readline.html#readline_example_read_file_stream_line_by_line

I've read on a website if you load an entire file (90mb) into memory using readFileSync, the memory usaged is about 224.5MB. using readline uses only 6.33MB so it's definitely a huge performance boost.

PS: can't share links in SO but if you were to google "node readline vs readfile" it will come up first.


If luxury of internet is available, i will definitely just load everything into a free-tier mongodb atlas instead of local file.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1