'Only print if the number of field is greater than a value with awk

I'm still a newbie to awk, what am I doing wrong? apologies for the poor description, I reformulate.

Goal

Only print the number of the second field if the number is > 20

lorem v3  <--- no print
ipsum v5  <--- no print
text v21  <--- print "21"
expla v12 <--- no print

My attempt that does not work

awk ' { sub("^v","",$2); if ( $2 > 20 ) print $2 } '
awk


Solution 1:[1]

Addressing OP's question about why the current code outputs 3:

Initially awk doesn't know if $2 is a number or a string.

The sub() call (a string function) tells awk that $2 is to be treated as a string, which also means $2 will be treated as a string for the rest of the script.

This leads to $2 > 20 being treated as a string comparison ('3' > '20') and since '3' (the string) is 'greater than' '20' (the string), a 3 is output.

To facilitate a numeric comparion we need a way to force awk to re-evaluate $2 as a numeric. One method is to add a zero, ie, $2+0. Making this one change to OP's current code:

$ echo "lorem v3" | awk ' { sub("^v","",$2); if ( $2+0 > 20 ) print $2 } '
           <<< no output

NOTE: for more details see GNU awk - variable typing


Addressing the latest change to the question:

Sample input:

$ cat input.dat
lorem v3
ipsum v5
text v21
expla v12

Running our awk code (additional print added for clarification) against input.dat:

$ awk ' { print "######",$0; sub("^v","",$2); if ( $2+0 > 20 ) print $2 } ' input.dat
###### lorem v3
###### ipsum v5
###### text v21
21
###### expla v12

Solution 2:[2]

With your shown samples, please try following. Simple explanation would be, printing values by echo command of shell and sending its output as standard input to awk program. In awk program checking condition using sub(substitute) function to substitute everything apart from digits in 2nd field with NULL AND checking if $2+0 is greater than 20, if both condition are met then print that line's 2nd field.

echo "lorem v3" | awk 'sub(/[^0-9]+/,"",$2) && $2+0>20{print $2}'

Solution 3:[3]

As an alternative you could check if the format of the second field is v followed by digits greater than 20, and when printing remove the first character.

The pattern for the digits matches 21-29 or 30-99 or 100 and above.

awk '
match($2, /^v(2[1-9]|[3-9][0-9]|[1-9][0-9]{2,})$/){
  print substr($2,2);
}' file

Output

21

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 glenn jackman