'How to customize unidecode?
I'm using unidecode module for replacing utf-8 characters. However, there are some characters, for example greek letters and some symbols like Å, which I want to preserve. How can I achieve this?
For example,
from unidecode import unidecode
test_str = 'α, Å ©'
unidecode(test_str)
gives the output a, A (c), while what I want is α, Å (c).
Solution 1:[1]
Run unidecode on each character individually. Have a whitelist set of characters that you use to bypass the unidecode.
>>> import string
>>> whitelist = set(string.printable + '?Å')
>>> test_str = '?, Å ©'
>>> ''.join(ch if ch in whitelist else unidecode.unidecode(ch) for ch in test_str)
'?, Å (c)'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
