'What is the list of python settings that affect encoding, decoding, and printing?
When I run into unicode printing problems, I want to know what I should check. In my particular case, I'm using an installed module that is printing unicode encoded characters using the wrong codec.
There are several disparate places that affect python encoding and decoding under a variety of circumstances. And specifically how python handles printable data in different circumstances.
Some things off the top of mind:
- general environment variables
LC_ALL,LANG - Python
sysmodule settingsys.getdefaultencoding()
What else am I forgetting?
I'm only interested in python 3.
Solution 1:[1]
things to check
Here is what I found, in order of how I recommend checking them:
- environment variables
LC_ALL,LANG,LC_CTYPE,LANGUAGE - Python-specific environment variables
PYTHONIOENCODING,PYTHONCOERCECLOCALE
(the affect of which may be affected by program argument-E; can checksys.flags.ignore_environment)- Windows-specific console encoding
PYTHONLEGACYWINDOWSSTDIO
- Windows-specific console encoding
- Python
sysmodule- function
sys.getdefaultencoding()
(the corollary functionsys.setdefaultencodingwas removed from Python 3) sys.stdin.encodingsys.stdout.encodingsys.stderr.encoding- file system encoding setting
sys.getfilesystemencoding()
- function
- Python file header
coding:, as in# -*- coding: utf-8 -*-
effects parser interpretation of built-in strings. localemodule- function call
locale.nl_langinfo(locale.CODESET)
(does not appear to work on Windows Python 3.7, worked on Debian Python 3.5) - function
locale.getdefaultlocale - function
locale.getpreferredencoding
(works differently on some systems)
- function call
gettextmodule and it's various facilities (too many to list all of them)- contents of the directories passed to some functions like
gettext.install(application, directory)orgettext.bindtextdomain(domain, directory)
- contents of the directories passed to some functions like
print the values
Here is quick script to list the values of most of these:
import os, sys, locale
print('environment:')
print('-E (ignore PYTHON* environment variables) ? %s' %
(True if sys.flags.ignore_environment else False))
for env in ('LC_ALL', 'LANG', 'LC_CTYPE',
'LANGUAGE', 'PYTHONIOENCODING',
'PYTHONLEGACYWINDOWSSTDIO'):
if env in os.environ:
print('"%s"="%s"' % (env, os.environ[env]))
else:
print('"%s" not set' % env)
print()
print('sys module:')
print('getdefaultencoding "%s"' % sys.getdefaultencoding())
print('sys.stdin.encoding "%s"' % sys.stdin.encoding)
print('sys.stdout.encoding "%s"' % sys.stdout.encoding)
print('sys.stderr.encoding "%s"' % sys.stderr.encoding)
print()
print('locale:')
try:
getattr(locale,'nl_langinfo')
print('locale.nl_langinfo(locale.CODESET) "%s"' \
% locale.nl_langinfo(locale.CODESET))
except AttributeError:
print('locale.nl_langinfo not available')
print('locale.getdefaultlocale()[1] "%s"' \
% locale.getdefaultlocale()[1])
print('locale.getpreferredencoding() "%s"' \
% locale.getpreferredencoding())
printed values on three systems
- Windows 10 with 3.7
- Debian 9 with 3.5
- Ubuntu 14 with 3.4
On Windows 10 using Python 3.7 within built-in PowerShell terminal, this prints
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG" not set
"LC_CTYPE" not set
"LANGUAGE" not set
"PYTHONIOENCODING"="UTF-8"
"PYTHONLEGACYWINDOWSSTDIO" not set
sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"
locale:
locale.nl_langinfo not available
locale.getdefaultlocale()[1] "cp1252"
locale.ngetpreferredencoding() "cp1252"
On Debian 9 using Python 3.5, this prints
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG"="en_GB.UTF-8"
"LC_CTYPE" not set
"LANGUAGE" not set
"PYTHONIOENCODING" not set
"PYTHONLEGACYWINDOWSSTDIO" not set
sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"
locale:
locale.nl_langinfo(locale.CODESET) "UTF-8"
locale.getdefaultlocale()[1] "UTF-8"
locale.ngetpreferredencoding() "UTF-8"
On Ubuntu 14.04 using Python 3.4, this prints
environment:
-E (ignore PYTHON* environment variables) ? False
"LC_ALL" not set
"LANG"="en_US.UTF-8"
"LC_CTYPE" not set
"LANGUAGE"="en_US:"
"PYTHONIOENCODING" not set
"PYTHONLEGACYWINDOWSSTDIO" not set
sys module:
getdefaultencoding "utf-8"
sys.stdin.encoding "UTF-8"
sys.stdout.encoding "UTF-8"
sys.stderr.encoding "UTF-8"
locale:
locale.nl_langinfo(locale.CODESET) "UTF-8"
locale.getdefaultlocale()[1] "UTF-8"
locale.getpreferredencoding() "UTF-8"
Unfortunately, when I run into unicode print problems with installed modules, it is not immediately obvious which setting is affecting that module. Doubly so, understanding how these different possible parameters and settings interact is all the more confounding. There are many combinations of settings to test.
But this little bit might help someone get started.
Also see helpful answers at SO Question How to set sys.stdout encoding in Python 3?.
Related PEPs to review
- PEP 540 -- Add a new UTF-8 Mode (
python -X UTF8 ...) - PEP 529 -- Change Windows filesystem encoding to UTF-8 (environment variable
PYTHONLEGACYWINDOWSFSENCODING) - PEP 263 -- Defining Python Source Code Encodings (
# -*- coding: ... -*-)
Some help from this pymotw article, python how-to unicode, python sys module, python locale module.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
