'Python3 DEFAULT (implicit) unicode UTF-8 normalization: how to?

Yepp, this is way not-the-first question about unicode normalization in Python.

As many folks know, there are "same" unicode letters which are "not the same" (and even string lengthes differ!):

In [1]: s='å'                                                                                        

In [2]: import unicodedata                                                                            

In [3]: q=unicodedata.normalize('NFD', s)

In [4]: q
Out[4]: 'å'

In [5]: s
Out[5]: 'å'

In [6]: s == q
Out[6]: False

In [7]: len(s), len(q)
Out[7]: (1, 2)

So, the question is: Is there a way to set DEFAULT normalization (say, "NFC") for all and every .decode("utf-8") calls?

I mean, can I add a hook (or whatever) to really normalize any input?

I've faced the problem handling input from different browsers in a backend API… Sometimes it arrive in NFD (for unknown reason) and make searches to fail.

PS. I do not want to "fix" all and every input routines to re-normalize what they get (already done and dislike it).

PPS. I'd like to have something like

class mystr(str):
    '''
    str(object='') -> str
    str(bytes_or_buffer[, encoding[, errors]]) -> str
    '''
    def __init__(self, *av, **kw):
        self._default_norm = kw.pop('utf_normalize', 'NFC')
        super().__init__(*av, **kw)
        if kw.get('encoding', sys.getdefaultencoding()).lower() in ('utf-8', 'utf8'):
            self = unicodedata.normalize(self._default_norm, self)
...
_builtin['str'] = mystr # "is this a real life, or just a fantasy"©…


Solution 1:[1]

No, there is no built-in explicit functionality to do this for you. However, Python lets you easily replace or wrap built-in classes, so it's not hard to build your own string type with the desired behavior.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tripleee