'Is Python's sort function the same as Linux's sort with LC_ALL=C

I'm porting a Bash script to Python. The script sets LC_ALL=C and uses the Linux sort command to ensure the native byte order instead of locale-specific sort orders (http://stackoverflow.com/questions/28881/why-doesnt-sort-sort-the-same-on-every-machine).

In Python, I want to use Python's list sort() or sorted() functions (without the key= option). Will I always get the same results as Linux sort with LC_ALL=C?



Solution 1:[1]

Considering you can add a comparison function, you can make sure that the sort is going to be the equivalent of LC_ALL=C. From the docs, though, it looks like if all the characters are 7bit, then it sorts in this manner by default, otherwise is uses locale specific sorting.

In the case that you have 8bit or Unicode characters, then locale specific sorting makes a lot of sense.

Solution 2:[2]

Non-unicode strings in Python version less than 3 are actually bytes. sort function and methods do not do anything to enforce locale (locale module function is needed to facilitate locale-aware sorting explicitly).

unicode strings and all strings of Python 3.x are no more bytes. There is a "bytes" type in Python 3.

Solution 3:[3]

I have been using International Components for Unicode, along with the PyICU bindings, to sort things with sorted() and using my own locale (Catalan on my case). For example, ordering a list of user profiles by name property:

collator = PyICU.Collator.createInstance(PyICU.Locale('ca_ES.UTF-8'))
sorted(user_profiles, key=lambda x: x.name, cmp=collator.compare)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Petesh
Solution 2 Roman Susi
Solution 3 nabucosound