'Pandas throwing error on compressed file (xz)
import pandas as pd
import lzma
df = pd.read_csv('final.csv', headers = None)
with open('/xzfolder/final.xz', 'wb') as f:
f.write(lzma.compress(df.to_records(index=False), format=lzma.FORMAT_XZ))
df = pd.read_csv('/xzfolder/final.xz', headers = None)
Above is my code. I am compressing my csv using lzma...but when I read compressed file I get UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 8: invalid continuation byte
Solution 1:[1]
I tried your code and faced the same error. I also tried to "unxz" the created file using a command line utility (xz on linux) but even that seemed to be giving out garbage - indicating that there is something wrong with the file creation.
I changed the code to use .to_string().encode() - thereby forcing a bytes object and it works
import lzma
import pandas as pd
df = pd.read_csv('somefile.txt', header=None)
with open('somez.xz', 'wb') as f:
f.write(lzma.compress(df.to_string().encode()
, format=lzma.FORMAT_XZ))
df_re = pd.read_csv('somez.xz')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mortz |
