'How to read large sample of images effectively without overloading ram memory?
While training the classification model I'm passing input image samples as NumPy array but when I try to train large dataset samples I run into memory error. Currently, I've 120 GB size of memory even with this size I run into memory error. I've enclosed code snippet below
x_train = np.array([np.array(ndimage.imread(image)) for image in image_list])
x_train = x_train.astype(np.float32)
Error traceback:
x_train = x_train.astype(np.float32) numpy.core._exceptions.MemoryError: Unable to
allocate 134. GiB for an array with shape (2512019,82,175,1) and data type float32
How can I fix this issue without increasing ram size? is there a better way to read the data like using cache or using protobuf?
Solution 1:[1]
Haha this is too funny this question comes up just as I put the first 2 32GB RAM sticks into my pc today for pretty much the same reason.
At this point it becomes necsessary to handle the data different.
I am not sure what you are using to do the learning. But if its tensorflow you can customize your input pipeline.
Anyways. It comes down to correctly analyze what you want to do with the data and the capabilities of your environment. If the data is ready to train and you just load it from disk it should not be a problem to only load then train on only a portion of it, then go to the next portion and so on.
You can split this data into multiple files or partially load the data (there are datatypes/fileformats to help with that). You can even optimize this so far that you can read from disk during training and have the next batch ready to go when you need it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | t0b4cc0 |
