'Appropriate file format for loading list of python objects object-by-object

I have custom python class Custom and want to dump/load List[Custom] (Let me refer this to a "Chunk" from now).

Also, consider the following setting.

  • Custom is too complex to hand write serialize/deserialize procedure.
  • Although instance of Custom is small data size but The chunk tend to be huge like 10GB.

There are so many situation where I need only small portion of the chunk (~10MB). Currently, I use pickle as the file format. I load the whole 10GB chunk by chunk = pickle.load() and use only small portion of it by like chunk_use = chunk[:100].

However this is memory/computationally inefficient for just use small portion of the chunk. So, it would be nice if I can load the chunk object-by-object like

chunk_use = []
for i in range(100):
    chunk_use.append(load_data(filename, i))

or more concisely

chunk_use = load_data(filename, 1, 100)

Is there appropriate data format and file format or library to do this?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source