'Why rstrip cannot return raw text in Python?
I am trying to print a text in Spanish line by line using the following Python code:
path = 'segismundo.txt' #set the path file
f = open(path, encoding="utf-8")
lines = [x.rstrip() for x in open(path)]
print(lines)
The raw text is:
Sueña el rico en su riqueza,
que más cuidados le ofrece;
sueña el pobre que padece
su miseria y su pobreza;
However, the result is:
['Sue帽a el rico en su riqueza,', 'que m谩s cuidados le ofrece;', '', 'sue帽a el pobreque
padece', 'su miseria y su pobreza;', '']
My system language is Chinese(all the weird words '帽', '谩' are Chinese characters) so I am wondering whether it is because rstrip method can only execute English?
Solution 1:[1]
Encoding and decoding is a finicky subject, especially because current software has to try to maintain compatibility with pre-Unicode software and files.
So the text you list there is not raw in the sense that that is not stored in the file. Files in most file systems contain bytes, and you have to know the encoding used for these files in some other ways. To help with that, Python by default guesses the encoding used for opening files based on the locale settings. You can override that with the encoding argument to open, as you did on the line starting with f = ..., but crucially not on the next line, where you open the same file again with the default encoding.
print has a similar issue: it can write to a file, or the output can be printed on a terminal, or piped to another process with, but crucially all of those processes operate on sequences of raw bytes, and thus strings need to be encoded.
So there is two potential mismatches in your code:
- The file is encoded with UTF-8 but gets decoded using your system default which may not be UTF-8.
- The output gets encoded with your system default encoding but your terminal assumes it is some other encoding.
Given the clues present in your question, my guess would be you simply need to change the line where you read the text to:
lines = [x.rstrip() for x in f]
You also never close the file, which is usually not an issue, but something to keep in mind for larger applications: you don't want to keep files open when you don't have to.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jasmijn |
