'Loading large files into memory with Python
When doing work with large files and datasets (usually 1 or 2 gb+), the process is killed do to running out of RAM. What tools and methods are available to allow saving memory, while allowing the necessary functions, such as iteration over the entire file, and accessing and assigning other large variables. Due to the need to have access to the entire file in read mode, I am unsure of solutions for the given problem. Thanks for any help.
For reference, the project I am currently encountering this problem in is right here (dev branch).
Solution 1:[1]
Generally, you can use memory-mapped files so not to map a section of a virtual memory in a storage device. This enable you to operate on a use space of memory mapped that would not fit in RAM. Note that this is significantly slower than RAM though (there is no free lunch). You can use Numpy to do that quite transparently with numpy.memmap. Alternatively, there is mmap. For sake of performance, you can operate on chunks on read write them once from the memory-mapped section.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Jérôme Richard |