'Float comparison in awk and mawk

I cannot understand why the float number comparison does not work in mawk:

mawk '$3 > 10' file.txt
[...]
9_6_F-repl      24834   38.8699
9_6_F   56523   17.9344
9_7_F   3196    3.68367
9_9_F   2278    2.37445
9_annua_M-merg  122663  163.557
9_huetii_F-merg 208077  172.775
[...]

While it does perfectly on awk like that:

awk '{if ($3 > 10) print $1}' file.txt

I'm obviously doing something wrong here, but I cannot understand what.



Solution 1:[1]

It fails if the file has CRLF line terminators. Remove the \r first:

$ file foo
foo: ASCII text, with CRLF line terminators
$ mawk 'sub(/\r/,"") && ($3 > 10)'  foo
9_6_F-repl      24834   38.8699
9_6_F   56523   17.9344
9_annua_M-merg  122663  163.557
9_huetii_F-merg 208077  172.775

Alternatively you could use dos2unix or such.

EDIT2: If you are using locale that has comma as decimal separator, it affects float comparisons in mawk.

In this case you can either:

1) set locale to

LANG="en_US.UTF-8"

or

2) change decimal separators to commas and pipe it to mawk:

mawk '$3 > 10' <(cat file.txt | sed -e "s/\./,/")

Solution 2:[2]

You don't need to set locale, but need to account for strange or errorneous input :

If the input has a dot, or any character than has a byte ordinance higher than ASCII "1" (which is a LOT of stuff) :

9_6_F-repl      24834   9.
9_6_F   56523   9.
9_annua_M-merg  122663  9.
9_huetii_F-merg 208077  9.
9_annua_M-merg  122663  :5.333

this would completely fail to produce the correct result, since $3 is being compared as a string, where an ASCII "9" is larger than ASCII "1" :

mawk2 'sub("\r*",_)*(10<$3)'

9_6_F-repl      24834   9.
9_6_F   56523   9.
9_annua_M-merg  122663  9.
9_huetii_F-merg 208077  9.
9_annua_M-merg  122663  9.
9_annua_M-merg  122663  :5.333

To rectify it, simply add + next to $3 :

mawk 'sub("\r*",_)*(10<+$3)'

If you don't care much for archaic gawk -P/-c/-t modes then it's even simpler :

mawk '10<+$3' RS='\r?\n'

Let ORS take care of the \r::CR on your behalf. By placing the ? at the RS regex, you can skip all the steps about using iconv or dos2unix or changing locale settings ::

  • RS—-->ORS would seamlessly handle it

This way the original input file remains intact, in case you need those CRs later for some reason.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tom S
Solution 2 RARE Kpop Manifesto