'E-Mail decode issue: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 135: invalid start byte
E-Mail clients decode messages correctly. So I assume there must be also a way do decode emails with python correctly.
I use the building email python library to process incoming emails.
import email
...
email_message = email.message_from_file(fp)
email_message.is_multipart() # => False
email_message.get_content_type() # 'text/plain'
to_decode = email_message.get_payload(decode=True)
charset = email_message.get_content_charset()
# charset is utf-8
to_decode.decode(charset)
Exception:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 135: invalid start byte
This is a part of the string within to_decode variable.
b'Dzie\\u0144 dobry,\n\nniestety w podany'
I figured out with try and error that I can to the following.
test = b'Dzie\\u0144 dobry,\n\nniestety w podany'
test.decode('unicode-escape')
>> output: 'Dzień dobry,\n\nniestety w podany'
Which is correct. But I think there must be a better way instead of guessing. How is my email client doing this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
