'Python unicode: why in one machine works but in another one it failed sometimes?

I found unicode in python really troublesome, why not Python use utf-8 for all the strings? I am in China so I have to use some Chinese string that can't represent by ascii, I use u'' to denote a string, it works well in my ubuntu machine, but in another ubuntu machine (VPS provided by linode.com), it fails some times. The error is:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

The code I am using is:

self.talk(user.record["fullname"] + u"准备好了")


Solution 1:[1]

You need to decode all non-Unicode strings as early as possible. Try to ensure you have no UTF-8 bytestrings stored anywhere in memory, and you have only unicode objects. For example, make sure that the elements of user.record are all converted to unicode on creation, so you don't get any errors like this one. Or just use Python 3 where it's hard to mix them.

Solution 2:[2]

Because for Python 2.x the default encoding is ASCII unless its changed manually. Here is a crude hack to include in your script before any other code

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

This will change default Python encoding to UTF-8.

Solution 3:[3]

It took me a long time, but I found it.

look at PRINTENV, specially LANG

LANG=en_CA <- server 2 (not working)

LANG=en_US.UTF-8 <- server 1 (working) "On Linode coincidentally)

Set new Locals

sudo update-locale LANG=en_US.UTF-8 LANGUAGE

Log out, back in, bob's your uncle :)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rosh Oxymoron
Solution 2 ismail
Solution 3 Glen Bizeau