'How to solve UnicodeDecodeError in Python 3.6?

I am switched from Python 2.7 to Python 3.6.

I have scripts that deal with some non-English content.

I usually run scripts via Cron and also in Terminal.

I had UnicodeDecodeError in my Python 2.7 scripts and I solved by this.

# encoding=utf8  
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

Now in Python 3.6, it doesnt work. I have print statements like print("Here %s" % (myvar)) and it throws error. I can solve this issue by replacing it to myvar.encode("utf-8") but I don't want to write with each print statement.

I did PYTHONIOENCODING=utf-8 in my terminal and I have still that issue.

Is there a cleaner way to solve UnicodeDecodeError issue in Python 3.6?

is there any way to tell Python3 to print everything in utf-8? just like I did in Python2?



Solution 1:[1]

I had this issue when using Python inside a Docker container based on Ubuntu 18.04. It appeared to be a locale issue, which was solved by adding the following to the Dockerfile:

ENV LANG C.UTF-8

Solution 2:[2]

To everyone using pickle to load a file previously saved in python 2 and getting an UnicodeDecodeError, try setting pickle encoding parameter:

with open("./data.pkl", "rb") as data_file:
    samples = pickle.load(data_file, encoding='latin1')

Solution 3:[3]

For a Python-only solution you will have to recreate your sys.stdout object:

import sys, codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())

After this, a normal print("hello world") should be encoded to UTF-8 automatically.

But you should try to find out why your terminal is set to such a strange encoding (which Python just tries to adopt to). Maybe your operating system is configured wrong somehow.

EDIT: In my tests unsetting the env variable LANG produced this strange setting for the stdout encoding for me:

LANG= python3
import sys
sys.stdout.encoding

printed 'ANSI_X3.4-1968'.

So I guess you might want to set your LANG to something like en_US.UTF-8. Your terminal program doesn't seem to do this.

Solution 4:[4]

Python 3 (including 3.6) is already Unicode supported. Here is the doc - https://docs.python.org/3/howto/unicode.html

So you don't need to force Unicode support like Python 2.7. Try to run your code normally. If you get any error reading a Unicode text file you need to use the encoding='utf-8' parameter while reading the file.

Solution 5:[5]

for docker with python3.6, use LANG=C.UTF-8 python or jupyter xxx works for me, thanks to @Daniel and @zhy

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Daniel
Solution 2 Mark Storm
Solution 3
Solution 4 ananto30
Solution 5 zhibo