'Print lines when value change in specific column
I have a file :
A 48
B 24
C 1
D 7
E 25
F 47
G 14
H 2
I 1
I would like print the lines when the second column gets the lowest one value, then the greatest one value etc (including the first and the last lines). It looks like a variation : 1st value -> lowest one value -> greatest -> lowest .. -> last :
A 48
C 1
F 47
I 1
I would like to go with awk. I can print the greatest (and smallest) values
awk 'NR == 1 || $12 > max {number = $0; max = $12} END {if (NR) print number, max}' file.txt
but not the all the lines corresponding to the variations.
Any help?
Solution 1:[1]
First thing first, thanks for nice Question(keep it up).
With your shown samples, please try following awk program a Generic one, which will traverse through whole Input_file and check minimum value and will get minimum values as per their occurrences just after previous occurrence of lowest number to next occurrence of lowest number.
awk -v min="" '
FNR==NR{
min=(min<$2?(min==""?$2:min):$2)
next
}
$2==min{
print arr[max] ORS $0
prevMax=max=""
}
{
max=(max>$2?max:$2)
if(prevMax!=max){
arr[max]=$0
}
prevMax=max
}
' Input_file Input_file
With your shown samples, output will be as follows:
A 48
C 1
F 47
I 1
Explanation: Adding detailed explanation for above code.
awk -v min="" ' ##Starting awk program from here, setting min value to NULL here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
min=(min<$2?(min==""?$2:min):$2) ##Getting minimum value among all the lines of Input_file.
next ##next will skip all further statements from here.
}
$2==min{ ##Checking condition if 2nd field is equal to min then do following.
print arr[max] ORS $0 ##printing array arr with index of max ORS and current line.
prevMax=max="" ##Nullifying prevMax and max here.
}
{
max=(max>$2?max:$2) ##Checking max value if current max is greater than $2 and assign $2 to it else keep max.
if(prevMax!=max){ ##If prevMax is NOT equal to max then do following.
arr[max]=$0 ##Setting current line to arr with index of max and value of current line.
}
prevMax=max ##Setting max to preMax here.
}
' Input_file Input_file ##Mentioning Input_file names here.
Solution 2:[2]
Assumptions:
- if the 2nd field were printed on a graph then the objective is to print the input lines that correspond to the peaks and troughs on the graph
- start by looking for a trough (ie, 2nd field values trending lower)
- peaks and troughs are determined when the trend of the 2nd field value changes direction
One awk idea:
awk '
BEGIN { dir=1 } # set trend direction: dir==1 => looking for trough; dir== -1 looking for peak
FNR==1 { print; prev2=$2; next } # always print the 1st line
{ if ( ($2*dir > prev2*dir) ) { # if we just switched the trend direction then ...
if (FNR>2) print prevline # print the previous line (as long as FNR>2) and ...
dir*=-1 # toggle the trend direction
}
prev2=$2 # update our "previous" variables
prevline=$0
}
END { print } # always print the last line
' file.txt
NOTES:
- from a testing perspective if our
>test comes back true then the trend has changed and we just saw a trough (ifdir==1) or peak (ifdir== -1) - we can use the same test (
>) when looking for peaks and troughs by simply inverting the sign of the values we're comparing; thedirection variable flip-flops between1and-1as the trend direction changes which effectively flip-flops our test between>and<
This generates:
A 48
C 1
F 47
I 1
A few modifications to the input file:
$ cat file.txt
A 48
AA 49
B 24
C 1
D 7
E 25
F 47
G 14
H 27
I 1
The awk script generates:
A 48
AA 49
C 1
F 47
G 14
H 27
I 1
A more verbose version:
awk '
BEGIN { trend="down" }
FNR==1 { print; prev2=$2; next }
trend=="down" { if ( ($2 > prev2) ) { # just found a trough?
if (FNR>2) print prevline
trend="up"
}
}
trend=="up" { if ( ($2 < prev2) ) { # just found a peak?
if (FNR>2) print prevline
trend="down"
}
}
{ prev2=$2; prevline=$0 }
END { print }
' file.txt
Solution 3:[3]
Assuming you do not want contiguous lines with the same high/low value printed then using any awk in any shell on every Unix box you could do:
$ cat tst.awk
{
prev2 = curr2
prev0 = curr0
curr2 = $2
curr0 = $0
}
NR == 1 {
print curr0
}
NR > 1 {
if ( curr2 > prev2 ) {
if ( dir == "dn" ) {
print prev0
}
dir = "up"
}
if ( curr2 < prev2 ) {
if ( dir == "up" ) {
print prev0
}
dir = "dn"
}
}
END {
print curr0
}
$ awk -f tst.awk file
A 48
C 1
F 47
I 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | Ed Morton |
