'Sorting a file using fields with specific value
Recently, I had to sort several files according to records' ID; the catch was that there can be several types of records, and in each of those the field I had to use for sorting is on a different position. The fields, however, are easily identifiable thanks to key=value structure. To show a simple sample of the general structure:
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
I came up with a pipeline as follows, which did the job:
awk -F'[|=]' '{for(i=1; i<=NF; i++) {if($i ~ "id") {i++; print $i"?"$0} }}' tester.txt | sort -n | awk -F'?' '{print $2}'
In other words the algorithm is as follows:
- Split the record by both field and key-value separators (
|and=) - Iterate through the elements and search for the
idkey - Print the next element (value of
idkey), a separator, and the whole line - Sort numerically
- Remove prepended identifier to preserve records' structure
Processing the sample gives the output:
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
Is there a way, though, to do this task using single awk command?
Solution 1:[1]
You may try this gnu-awk code to to this in a single command:
awk -F'|' '{
for(i=1; i<=NF; ++i)
if ($i ~ /^id=/) {
a[gensub(/^id=/, "", 1, $i)] = $0
break
}
}
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for (i in a)
print a[i]
}' file
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
We are using | as field delimiter and when there is a column name starting with id= we store it in array a with index as text after = and value as the full record.
Using PROCINFO["sorted_in"] = "@ind_num_asc" we sort array a using numerical value of index and then in for loop we print value part to get the sorted output.
Solution 2:[2]
Using GNU awk for the 3rd arg to match() and sorted_in:
$ cat tst.awk
match($0,/(^|\|)id=([0-9]+)/,a) {
ids2vals[a[2]] = $0
}
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for ( id in ids2vals ) {
print ids2vals[id]
}
}
$ awk -f tst.awk file
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
Solution 3:[3]
Try Perl: perl -e 'print map { s/^.*? //; $_ } sort { $a <=> $b } map { ($id) = /id=(\d+)/; "$id $_" } <>' file
Some explanation of the code I use:
print #print the resulting list of lines
map {
s/^.*? //;
$_
} #remove numeric id from start of line
sort { $a <=> $b } #sort numerically
map {
($id) = /id=(\d+)/;
"$id $_"
} # capture id and place it in start of line
<> # read all lines from file
Or try sed and sort: sed 's/^\(.*id=\([0-9][0-9]*\).*\)$/\2 \1/' file | sort -n | sed 's/^[^ ][^ ]* //'
Solution 4:[4]
With your shown samples only, please try following(awk + sort + cut) solution, written and tested in GNU awk, should work in any awk.
awk '
match($0,/id=[0-9]+/){
print substr($0,RSTART,RLENGTH)";"$0
}
' Input_file | sort -t'=' -k2n | cut -d';' -f2-
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
match($0,/id=[0-9]+/){ ##Using awk match function to match id= followed by digits.
print substr($0,RSTART,RLENGTH)";"$0 ##printing sub string of matched value followed by current line along with semi-colon in it.
}
' Input_file | ##Mentioning Input_file here and passing awk output as a standard input to next command.
sort -t'=' -k2n | ##Sorting output with delimiter of = and by 2nd field then passing output to next command as an input.
cut -d';' -f2- ##Using cut command making delimiter as ; and printing everything from 2nd field onwards.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | |
| Solution 4 |
