'What is the advantage of saving `.npz` files instead of `.npy` in python, regarding speed, memory and look-up?
The python documentation for the numpy.savez which saves an .npz file is:
The .npz file format is a zipped archive of files named after the variables they contain. The archive is not compressed and each file in the archive contains one variable in .npy format. [...]
When opening the saved .npz file with load a NpzFile object is returned. This is a dictionary-like object which can be queried for its list of arrays (with the .files attribute), and for the arrays themselves.
My question is: what is the point of numpy.savez?
Is it just a more elegant version (shorter command) to save multiple arrays, or is there a speed-up in the saving/reading process? Does it occupy less memory?
Solution 1:[1]
The main advantage is that the arrays are lazy loaded. That is, if you have an npz file with 100 arrays you can load the file without actually loading any of the data. If you request a single array, only the data for that array is loaded.
A downside to npz files is they can't be memory mapped (using load(<file>, mmap_mode='r')), so for large arrays they may not be the best choice. For data where the arrays have a common shape I'd suggest taking a look at structured arrays. These can be memory mapped, allow accessing data with dict-like syntax (i.e., arr['field']), and are very efficient memory wise.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user2699 |
