'Float comparison in awk and mawk
I cannot understand why the float number comparison does not work in mawk:
mawk '$3 > 10' file.txt
[...]
9_6_F-repl 24834 38.8699
9_6_F 56523 17.9344
9_7_F 3196 3.68367
9_9_F 2278 2.37445
9_annua_M-merg 122663 163.557
9_huetii_F-merg 208077 172.775
[...]
While it does perfectly on awk like that:
awk '{if ($3 > 10) print $1}' file.txt
I'm obviously doing something wrong here, but I cannot understand what.
Solution 1:[1]
It fails if the file has CRLF line terminators. Remove the \r first:
$ file foo
foo: ASCII text, with CRLF line terminators
$ mawk 'sub(/\r/,"") && ($3 > 10)' foo
9_6_F-repl 24834 38.8699
9_6_F 56523 17.9344
9_annua_M-merg 122663 163.557
9_huetii_F-merg 208077 172.775
Alternatively you could use dos2unix or such.
EDIT2: If you are using locale that has comma as decimal separator, it affects float comparisons in mawk.
In this case you can either:
1) set locale to
LANG="en_US.UTF-8"
or
2) change decimal separators to commas and pipe it to mawk:
mawk '$3 > 10' <(cat file.txt | sed -e "s/\./,/")
Solution 2:[2]
You don't need to set locale, but need to account for strange or errorneous input :
If the input has a dot, or any character than has a byte ordinance higher than ASCII "1" (which is a LOT of stuff) :
9_6_F-repl 24834 9.
9_6_F 56523 9.
9_annua_M-merg 122663 9.
9_huetii_F-merg 208077 9.
9_annua_M-merg 122663 :5.333
this would completely fail to produce the correct result, since $3 is being compared as a string, where an ASCII "9" is larger than ASCII "1" :
mawk2 'sub("\r*",_)*(10<$3)'
9_6_F-repl 24834 9.
9_6_F 56523 9.
9_annua_M-merg 122663 9.
9_huetii_F-merg 208077 9.
9_annua_M-merg 122663 9.
9_annua_M-merg 122663 :5.333
To rectify it, simply add + next to $3 :
mawk 'sub("\r*",_)*(10<+$3)'
If you don't care much for archaic gawk -P/-c/-t modes then it's even simpler :
mawk '10<+$3' RS='\r?\n'
Let ORS take care of the \r::CR on your behalf. By placing the ? at the RS regex, you can skip all the steps about using iconv or dos2unix or changing locale settings ::
RS—-->ORSwould seamlessly handle it
This way the original input file remains intact, in case you need those CRs later for some reason.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tom S |
| Solution 2 | RARE Kpop Manifesto |
