'How to parse a table with spaces using regex and add text? [closed]

I want to take this file output:

Summary

-I- Stage                               Warnings   Errors     Comment

-I- jamaico                                3          0

-I- jamaico1 Check                         0          0

-I- jamaico Check                          0          0

-I- jamaico Manager                        0          0

-I- jamaico Counters                       0          0

-I- jamaico Information                    16         0

-I- jamaico / Width checks                 0          15

-I- jamaico                                0          0

-I- jamaico Keys                           0          0

-I- jamaico Sensing                        0          0

-I- Create jamaico File                    0          0

and transform it using regex to this:

“jamaico Information section has 16 Warnings and 0 Errors”

“jamaico / Width checks section has 0 Warnings and 15 Errors”

etc.... for each line.

I've tried using awk, but I can't manage to get the stage section without the word "Summary" and I don't know how to properly insert the words "section has" and "errors" in the right place.

I've only made it this far:

cat file.txt | awk -F' '"{2}" '/^-I-/{print $1}'
-I- Stage
-I- jamaico
-I- jamaico1 Check
-I- jamaico Check
-I- jamaico Manager
-I- jamaico Counters
-I- jamaico Information
-I- jamaico / Width checks
-I- jamaico
-I- jamaico Keys
-I- jamaico Sensing
-I- Create linux File

any suggestions?

Thanks



Solution 1:[1]

This is one way to fix your formatting:

awk '/-I-/&&!/-I- Stage/ {
    $0=substr($0,1,62);
    s=substr($0,5,36);
    gsub(/ *$/,"",s);
    printf("\"%s section has %d Warnings and %d Errors\"\n",s,$(NF-1),$NF);
}' file.txt
  • /^-I-/&&!/^-I- Stage/ - Match on lines starting with -I- but not -I- Stage.
  • $0=substr($0,1,62); - Remove the Comment column from the input by only keeping the first 62 characters of the line.
  • s=substr($0,5,36); - Take the substring of the whole line starting at position 5 followed by 36 characters.
  • gsub(/ *$/,"",s); - In the string s, replace the trailing spaces (/ *$/) with "" (nothing)
  • printf
    • \" - A literal "
    • %s - A string (which is taken from the s variable we created above)
    • %d - An integer, where $(NF-1) is the second last field and $NF is the last field.

Output:

"Discovery section has 3 Warnings and 0 Errors"
"Lids Check section has 0 Warnings and 0 Errors"
"Links Check section has 0 Warnings and 0 Errors"
"Subnet Manager section has 0 Warnings and 0 Errors"
"Port Counters section has 0 Warnings and 0 Errors"
"Nodes Information section has 16 Warnings and 0 Errors"
"Speed / Width checks section has 0 Warnings and 15 Errors"
"Virtualization section has 0 Warnings and 0 Errors"
"Partition Keys section has 0 Warnings and 0 Errors"
"Temperature Sensing section has 0 Warnings and 0 Errors"
"Create IBNetDiscover File section has 0 Warnings and 0 Errors"

Solution 2:[2]

You are working with fixed width columns, GNU AWK has feature for working with such files: FIELDWIDTHS, let file.txt content be

Summary

-I- Stage                               Warnings   Errors     Comment

-I- Discovery                           3          0

-I- Lids Check                          0          0

-I- Links Check                         0          0

-I- Subnet Manager                      0          0

-I- Port Counters                       0          0

-I- Nodes Information                   16         0

-I- Speed / Width checks                0          15

-I- Virtualization                      0          0

-I- Partition Keys                      0          0

-I- Temperature Sensing                 0          0

-I- Create IBNetDiscover File           0          0

then

awk 'BEGIN{FIELDWIDTHS="4 36 11 11"}NR>3&&($3||$4){sub(/ +$/,"",$2);print $2 " section has " $3+0 " Warnings and " $4+0 " Errors"}' file.txt

output

Discovery section has 3 Warnings and 0 Errors
Nodes Information section has 16 Warnings and 0 Errors
Speed / Width checks section has 0 Warnings and 15 Errors

Explanation: I firstly inform GNU AWK that 1st column is 4 characters wide, 2nd 36 characters wide, 3rd 11 characters wide, 4th 11 characters wide. THen for each line beoynd 3rd (as 1st, 2nd and 3rd lines are just header) if there is non-zero value in 3rd or 4th column I do remove trailing spaces from 2nd column and concat string to print desired message. Adding zeros, cause turning these string into numbers, so trailing spaces are removed.

(tested in gawk 4.2.1)

Solution 3:[3]

Using sed

$ sed 's/ \+/ /g;1,3d; s/.*- \([^0-9]\+\) \([^ ]*\) \(.*\)/"\1 section has \2 Warnings and \3 Errors"/' input_file

"Discovery section has 3 Warnings and 0 Errors"

"Lids Check section has 0 Warnings and 0 Errors"

"Links Check section has 0 Warnings and 0 Errors"

"Subnet Manager section has 0 Warnings and 0 Errors"

"Port Counters section has 0 Warnings and 0 Errors"

"Nodes Information section has 16 Warnings and 0 Errors"

"Speed / Width checks section has 0 Warnings and 15 Errors"

"Virtualization section has 0 Warnings and 0 Errors"

"Partition Keys section has 0 Warnings and 0 Errors"

"Temperature Sensing section has 0 Warnings and 0 Errors"

"Create IBNetDiscover File section has 0 Warnings and 0 Errors"

Solution 4:[4]

You could set the field separator to 2 or more spaces.

Then check if the first field starts with -I- and check if for example the second column is not "Warnings" as that is also in the headers and part of the output.

If it does, print the field 1 value after the regex match plus the desired text with the value of field 2 and 3.

awk -F"[[:space:]]{2,}" '
match($1, /^-I- /) && $2 != "Warnings" {
  print "\""substr($1, RLENGTH + 1), "section has", $2, "warnings and", $3, "Errors\""
}
' file

Output

"Discovery section has 3 warnings and 0 Errors"
"Lids Check section has 0 warnings and 0 Errors"
"Links Check section has 0 warnings and 0 Errors"
"Subnet Manager section has 0 warnings and 0 Errors"
"Port Counters section has 0 warnings and 0 Errors"
"Nodes Information section has 16 warnings and 0 Errors"
"Speed / Width checks section has 0 warnings and 15 Errors"
"Virtualization section has 0 warnings and 0 Errors"
"Partition Keys section has 0 warnings and 0 Errors"
"Temperature Sensing section has 0 warnings and 0 Errors"
"Create IBNetDiscover File section has 0 warnings and 0 Errors"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Daweo
Solution 3 HatLess
Solution 4 The fourth bird