'Best format for Pandas serialization on disk
For my workload, I need to serialize on disk Pandas dataframe (Text +Datas) with a size of 5Go per Dataframe. Came across various solutions:
HDF5 : Issues with string
Feather: not stable
CSV: Ok, but large file size.
pickle : Ok, cross-platform, can we do better ?
gzip : Same than CSV (slow for read access).
SFrame: Good, but not maintained anymore.
Just wondering any alternative solution to pickle to store string Dataframe on disk ?
Solution 1:[1]
I suggest reading this article: https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d
The author concludes that feather
is the most efficient serialization. However, it would not suitable for long-term storage - which is likely to be CSV
(form long-term).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | felipecrp |