'invalid continuation byte when reading file

Here is my Code line:

m_data = pd.read_table(m_path, sep='::', header=None, names=mnames)

results in the error:

'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte

I have specified a coder in my code:

m_data = pd.read_table(m_path, sep='::', header=None, names=mnames,encoding='utf-8')

But the problem still exists. What should I do then?



Solution 1:[1]

'utf-8' codec can't decode byte 0xe9 in position 3114: invalid continuation byte

Here the error message means you should NOT use utf8 encoding.

It might be utf16, gbk and so on, if you have ever heard them.

If you still got the message like that, after some possible attempts.

I will suggest chardet package.

It is very easy to use.

import chardet
with open("your_file", mode="rb") as f:
    print(chardet.detect(f.read(2000)))

rb means, read it as binary code. 2000 means, the bytes size you wanna detect. Often, the larger you set, the more accurate the results.

chardet - pypi

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 FavorMylikes