'Is there a way to change Python's open() default text encoding?
Can I change default open() (io.open() in 2.7) text encoding in a cross-platform way?
So that I didn't need to specify each time open(...,encoding='utf-8').
In text mode, if encoding is not specified the encoding used is platform dependent:
locale.getpreferredencoding(False)is called to get the current locale encoding.
Though documentation doesn't specify how to set preferred encoding. The function is in locale module, so I need to change locale? Is there any reliable cross-platform way to set UTF-8 locale? Will it affect anything else other than the default text file encoding?
Or locale changes are dangerous (can break something), and I should stick to custom wrapper such as:
def uopen(*args, **kwargs):
return open(*args, encoding='UTF-8', **kwargs)
Solution 1:[1]
you can set the encoding ... but its really hacky
import sys
sys.getdefaultencoding() #should print your default encoding
sys.setdefaultencoding("utf8") #error ... no setdefaultencoding ... but...
reload(sys)
sys.setdefaultencoding("utf8") #now it succeeds ...
I would instead do
main_script.py
import __builtin__
old_open = open
def uopen(*args, **kwargs):
return open(*args, encoding='UTF-8', **kwargs)
__builtin__.open = uopen
then anywhere you call open it will use the utf8 encoding ... however it may give you errors if you explicitly add an encoding
or just explicitly pass the encoding any time you open a file , or use your wrapper ...
pythons general philosophy is explicit is better than implicit, which implies the "right" solution is to explicitly declare your encoding when opening a file ...
Solution 2:[2]
If you really need to change the default encoding, you can replace the built-in open function.
original_open = __builtins__.open
def uopen(*args, **kwargs):
if "b" not in (args[1] if len(args) >= 2 else kwargs.get("mode", "")):
kwargs.setdefault("encoding", "UTF-8")
return original_open(*args, **kwargs)
__builtins__.open = uopen
I wrote and tested this snipped after I found this mails about replacing print on a mailing list.
Solution 3:[3]
Maybe PEP 540 (UTF-8 Mode) is what you want:
https://peps.python.org/pep-0540/
Use -Xutf8
python.exe -Xutf8 -c "open('tmp.txt', 'w', encoding='utf8').write('????0123'); print(open('tmp.txt').read())"
Use PYTHONUTF8 in PowerShell
$env:PYTHONUTF8=1; python.exe -c "open('tmp.txt', 'w', encoding='utf8').write('????0123'); print(open('tmp.txt').read())"
Use PYTHONUTF8 in Cmd
set PYTHONUTF8=1&& python.exe -c "open('tmp.txt', 'w', encoding='utf8').write('????0123'); print(open('tmp.txt').read())"
You can also execute setx PYTHONUTF8 1 to save it as user-level environment variable.
Solution 4:[4]
I would not change anything in locale, as it could have a lot of side effects in other parts of your system. open is a system level function call, so its settings can have effects outside of that, or at a minimum other Python programs that use the same Python installation. Your wrapper looks appropriate, is very clean and portable, and looks to be the correct solution.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Joran Beasley |
| Solution 2 | |
| Solution 3 | BaiJiFeiLong |
| Solution 4 | Philip Massey |
