'Print lines when value change in specific column

I have a file :

A 48
B 24
C 1
D 7
E 25
F 47
G 14
H 2
I 1

I would like print the lines when the second column gets the lowest one value, then the greatest one value etc (including the first and the last lines). It looks like a variation : 1st value -> lowest one value -> greatest -> lowest .. -> last :

A 48
C 1
F 47
I 1

I would like to go with awk. I can print the greatest (and smallest) values

awk 'NR == 1 || $12 > max {number = $0; max = $12} END {if (NR) print number, max}' file.txt

but not the all the lines corresponding to the variations.

Any help?

awk


Solution 1:[1]

First thing first, thanks for nice Question(keep it up).

With your shown samples, please try following awk program a Generic one, which will traverse through whole Input_file and check minimum value and will get minimum values as per their occurrences just after previous occurrence of lowest number to next occurrence of lowest number.

awk -v min="" '
FNR==NR{
  min=(min<$2?(min==""?$2:min):$2)
  next
}
$2==min{
  print arr[max] ORS $0
  prevMax=max=""
}
{
  max=(max>$2?max:$2)
  if(prevMax!=max){
    arr[max]=$0
  }
  prevMax=max
}
'  Input_file  Input_file

With your shown samples, output will be as follows:

A 48
C 1
F 47
I 1

Explanation: Adding detailed explanation for above code.

awk -v min="" '                     ##Starting awk program from here, setting min value to NULL here.
FNR==NR{                            ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
  min=(min<$2?(min==""?$2:min):$2)  ##Getting minimum value among all the lines of Input_file.
  next                              ##next will skip all further statements from here.
}
$2==min{                            ##Checking condition if 2nd field is equal to min then do following.
  print arr[max] ORS $0             ##printing array arr with index of max ORS and current line.
  prevMax=max=""                    ##Nullifying prevMax and max here.
}
{
  max=(max>$2?max:$2)               ##Checking max value if current max is greater than $2 and assign $2 to it else keep max.
  if(prevMax!=max){                 ##If prevMax is NOT equal to max then do following.
    arr[max]=$0                     ##Setting current line to arr with index of max and value of current line.
  }
  prevMax=max                       ##Setting max to preMax here.
}
'  Input_file  Input_file           ##Mentioning Input_file names here.

Solution 2:[2]

Assumptions:

  • if the 2nd field were printed on a graph then the objective is to print the input lines that correspond to the peaks and troughs on the graph
  • start by looking for a trough (ie, 2nd field values trending lower)
  • peaks and troughs are determined when the trend of the 2nd field value changes direction

One awk idea:

awk '
BEGIN  { dir=1 }                          # set trend direction: dir==1 => looking for trough; dir== -1 looking for peak
FNR==1 { print; prev2=$2; next }          # always print the 1st line
       { if ( ($2*dir > prev2*dir) ) {    # if we just switched the trend direction then ...
            if (FNR>2) print prevline     # print the previous line (as long as FNR>2) and ...
            dir*=-1                       # toggle the trend direction
         }
         prev2=$2                         # update our "previous" variables
         prevline=$0
       }
END    { print }                          # always print the last line
' file.txt

NOTES:

  • from a testing perspective if our > test comes back true then the trend has changed and we just saw a trough (if dir==1) or peak (if dir== -1)
  • we can use the same test (>) when looking for peaks and troughs by simply inverting the sign of the values we're comparing; the direction variable flip-flops between 1 and -1 as the trend direction changes which effectively flip-flops our test between > and <

This generates:

A 48
C 1
F 47
I 1

A few modifications to the input file:

$ cat file.txt
A 48
AA 49
B 24
C 1
D 7
E 25
F 47
G 14
H 27
I 1

The awk script generates:

A 48
AA 49
C 1
F 47
G 14
H 27
I 1

A more verbose version:

awk '
BEGIN         { trend="down" }
FNR==1        { print; prev2=$2; next }
trend=="down" { if ( ($2 > prev2) ) {             # just found a trough?
                   if (FNR>2) print prevline
                   trend="up"
                }
              }
trend=="up"   { if ( ($2 < prev2) ) {             # just found a peak?
                   if (FNR>2) print prevline
                   trend="down"
                }
              }
              { prev2=$2; prevline=$0 }
END           { print }
' file.txt

Solution 3:[3]

Assuming you do not want contiguous lines with the same high/low value printed then using any awk in any shell on every Unix box you could do:

$ cat tst.awk
{
    prev2 = curr2
    prev0 = curr0
    curr2 = $2
    curr0 = $0
}

NR == 1 {
    print curr0
}

NR > 1 {
    if ( curr2 > prev2 ) {
        if ( dir == "dn" ) {
            print prev0
        }
        dir = "up"
    }

    if ( curr2 < prev2 ) {
        if ( dir == "up" ) {
            print prev0
        }
        dir = "dn"
    }
}

END {
    print curr0
}

$ awk -f tst.awk file
A 48
C 1
F 47
I 1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Ed Morton