'E-Mail decode issue: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 135: invalid start byte

E-Mail clients decode messages correctly. So I assume there must be also a way do decode emails with python correctly.

I use the building email python library to process incoming emails.

import email

...
email_message = email.message_from_file(fp)
email_message.is_multipart() # => False
email_message.get_content_type() # 'text/plain'
to_decode = email_message.get_payload(decode=True)
charset = email_message.get_content_charset()
# charset is utf-8
to_decode.decode(charset)

Exception:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 135: invalid start byte

This is a part of the string within to_decode variable.

b'Dzie\\u0144 dobry,\n\nniestety w podany'

I figured out with try and error that I can to the following.

test = b'Dzie\\u0144 dobry,\n\nniestety w podany'
test.decode('unicode-escape')
>> output: 'Dzień dobry,\n\nniestety w podany'

Which is correct. But I think there must be a better way instead of guessing. How is my email client doing this?

python email mime

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'E-Mail decode issue: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 135: invalid start byte

Sources

Related Questions