'Best format for Pandas serialization on disk

For my workload, I need to serialize on disk Pandas dataframe (Text +Datas) with a size of 5Go per Dataframe. Came across various solutions:

HDF5   : Issues with string
Feather: not stable
CSV: Ok, but large file size.
pickle : Ok, cross-platform, can we do better ?
gzip : Same than CSV (slow for read access).
SFrame:  Good, but not maintained anymore.

Just wondering any alternative solution to pickle to store string Dataframe on disk ?



Solution 1:[1]

I suggest reading this article: https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d

The author concludes that feather is the most efficient serialization. However, it would not suitable for long-term storage - which is likely to be CSV (form long-term).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 felipecrp