'awk - understanding how FS works

I know that default FS= " ", then why am i seeing variations in following awk commands. Please help me understand.

>echo "   ABC DEF   XYZ  \n   abc       def,ghi   xyz   \n" | awk '{printf("nf: %s 1:%s line: %s\n", NF, $1, $0)}'
nf: 3 1:ABC line:    ABC DEF   XYZ  
nf: 3 1:abc line:    abc       def,ghi   xyz   
nf: 0 1: line: 
                                                                                                                                                               
>echo "   ABC DEF   XYZ  \n   abc       def,ghi   xyz   \n" | awk -F" " '{printf("nf: %s 1:%s line: %s\n", NF, $1, $0)}'
nf: 3 1:ABC line:    ABC DEF   XYZ  
nf: 3 1:abc line:    abc       def,ghi   xyz   
nf: 0 1: line: 
                                                                                                                                                               
>echo "   ABC DEF   XYZ  \n   abc       def,ghi   xyz   \n" | awk -F"[ ]" '{printf("nf: %s 1:%s line: %s\n", NF, $1, $0)}'
nf: 10 1: line:    ABC DEF   XYZ  
nf: 17 1: line:    abc       def,ghi   xyz   
nf: 0 1: line: 
                                                                                                                                                               
>echo "   ABC DEF   XYZ  \n   abc       def,ghi   xyz   \n" | awk -F"[ ]*" '{printf("nf: %s 1:%s line: %s\n", NF, $1, $0)}'
nf: 5 1: line:    ABC DEF   XYZ  
nf: 5 1: line:    abc       def,ghi   xyz   
nf: 0 1: line: 
                                         

I want to understand why there are no empty tokens in 1st & 2nd examples, but exists in 3rd & 4th examples.

Update: To explain my doubt further, awk behaves inconsistently with default FS and custom FS. See below examples.

>printf "ab  cd\nef gh\n" | awk -F" " '{ printf("nf: %d\t", NF); for (i=1;i<=NF;i++) printf("%02d:%s\t", i, $i); print ""}'
nf: 2   01:ab   02:cd   
nf: 2   01:ef   02:gh

>printf "ab::cd\nef:gh\n" | awk -F":" '{ printf("nf: %d\t", NF); for (i=1;i<=NF;i++) printf("%02d:%s\t", i, $i); print ""}'
nf: 3   01:ab   02:     03:cd   
nf: 2   01:ef   02:gh


Solution 1:[1]

By default awk uses a single space as the default FS. This is a special case and is the only special case. Two or more spaces are not interpreted as multiple fields, but as a single separator. Using any other character causes each occurrence of that character to be interpreted as a separator. So using ':' will interpret ":::my" as four fields. (empty, empty, empty, "my") See: GNU Awk User's Guide - 4.5.1 Whitespace Normally Separates Fields.

When you use a Regular Expression, each occurrence of the FS character (even a space) is considered a separate field separator. See GNU Awk User's Guide - 4.5.2 Using Regular Expressions to Separate Fields.

To examine every character as a separate field, you can simply set FS to the empty-string (null), either on the command line with -F"" or by setting FS = "".

In your examples where you use the Regex -F"[ ]" each space is considered a separate field separator. FS is a Regex and not the default case. It is a Regex where the single character just happens to be a space.

With the repetition of * (zero-or-more) occurrences, the FS is a bit ambiguous. It can match nothing (null) or it can match as many spaces as there are in a row. (which is why it matches the very first character and then multiple spaces) I do not recommend messing with spaces and FS in this manner.

awk understands Extended Regular Expression (ERE) syntax, so you can use the '+' repetition specifier for one-or-more occurrences of the character.

Keep the GNU Awk User's Guide handy. It is a good reference for gawk as well as the other flavors of awk. In the guide if something is unique to gawk, it will be marked with a '#' in the guide to tell you. It usually explains (sometimes in a footnote) how the gawk behavior is different than POSIX awk or mawk, etc..

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1