'Performance issue while using microstream
I just started learning microstream. After going through the examples published to microstream github repository, I wanted to test its performance with an application that deals with more data.
Application source code is available here.
Instructions to run the application and the problems I faced are available here
To summarize, below are my observations
- While loading a file with 2.8+ million records, processing takes 5 minutes
- While calculating statistics based on loaded data, application fails with an
OutOfMemoryError
Why is microstream trying to load all data (4 GB) into memory? Am I doing something wrong?
Solution 1:[1]
MicroStream is not like a traditional database and starts from the concept that all data are in memory. And an Object graph can be stored to disk (or other media) when you store this through the StorageManager.
In your case, all data are in 1 list and thus when accessing this list it reads all records from the disk. The Lazy reference isn't useful how you have used it since it just handles the access to the one list with all data.
Some optimizations that you can introduce.
- Split the data based on vendorId, or day using a Map<String, Lazy<List>>
- When a Map value is 'processed' removed it from the memory again by clearing the lazy reference. https://docs.microstream.one/manual/5.0/storage/loading-data/lazy-loading/clearing-lazy-references.html
- Increase the number of Channels to optimize the reading and writing the data. see https://docs.microstream.one/manual/5.0/storage/configuration/using-channels.html
- Don't store the object graph every 10000 lines but just at the end of the loading.
Hope this helps you solve the issues you have at the moment
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Rudy De Busscher |