'Performance issue while using microstream

I just started learning microstream. After going through the examples published to microstream github repository, I wanted to test its performance with an application that deals with more data.

Application source code is available here.

Instructions to run the application and the problems I faced are available here

To summarize, below are my observations

While loading a file with 2.8+ million records, processing takes 5 minutes
While calculating statistics based on loaded data, application fails with an OutOfMemoryError

Why is microstream trying to load all data (4 GB) into memory? Am I doing something wrong?

microstream

Solution 1:^[1]

MicroStream is not like a traditional database and starts from the concept that all data are in memory. And an Object graph can be stored to disk (or other media) when you store this through the StorageManager.

In your case, all data are in 1 list and thus when accessing this list it reads all records from the disk. The Lazy reference isn't useful how you have used it since it just handles the access to the one list with all data.

Some optimizations that you can introduce.

Split the data based on vendorId, or day using a Map<String, Lazy<List>>
When a Map value is 'processed' removed it from the memory again by clearing the lazy reference. https://docs.microstream.one/manual/5.0/storage/loading-data/lazy-loading/clearing-lazy-references.html
Increase the number of Channels to optimize the reading and writing the data. see https://docs.microstream.one/manual/5.0/storage/configuration/using-channels.html
Don't store the object graph every 10000 lines but just at the end of the loading.

Hope this helps you solve the issues you have at the moment

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Rudy De Busscher

'Performance issue while using microstream

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]