'Appropriate file format for loading list of python objects object-by-object
I have custom python class Custom and want to dump/load List[Custom] (Let me refer this to a "Chunk" from now).
Also, consider the following setting.
Customis too complex to hand write serialize/deserialize procedure.- Although instance of
Customis small data size but The chunk tend to be huge like 10GB.
There are so many situation where I need only small portion of the chunk (~10MB). Currently, I use pickle as the file format. I load the whole 10GB chunk by chunk = pickle.load() and use only small portion of it by like chunk_use = chunk[:100].
However this is memory/computationally inefficient for just use small portion of the chunk. So, it would be nice if I can load the chunk object-by-object like
chunk_use = []
for i in range(100):
chunk_use.append(load_data(filename, i))
or more concisely
chunk_use = load_data(filename, 1, 100)
Is there appropriate data format and file format or library to do this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
