'How to sort a specified column in l、Linux
This is my two column sequence, I want to combine them into 1 column and sort them in Linux, but I don't know how to write the shell script to handle them.
GGCTGCAGCTAACAGGTGA TACTCGGGGAGCTGCGG
CCTCTGGCTCGCAGGTCATGGC CAGCGTCTTGCGCTCCT
GCTGCAGCTACATGGTGTCG CGCTCCGCTTCTCTCTACG
The sorted results are as follows (first column first, second column second, and split by "\t")
1 GGCTGCAGCTAACAGGTGA
2 CCTCTGGCTCGCAGGTCATGGC
3 GCTGCAGCTACATGGTGTCG
4 TACTCGGGGAGCTGCGG
5 CAGCGTCTTGCGCTCCT
6 CGCTCCGCTTCTCTCTACG
what should I do?
Solution 1:[1]
You can do it easily in awk by storing the second column in an array and then outputting the saved values in the END rule, e.g.
awk '
{
print ++n, $1 # output first column
a[n] = $2 # save second column in array
}
END {
j = n + 1 # j is next counter
for (i=1;i<=n;i++) # loop 1 - n
print j++, a[i] # output j and array value
}
' file.txt
Example Use/Output
With your input in file.txt, you can just copy/middle-mouse-paste the above in an xterm with file.txt in the current directory, e.g.
$ awk '
> {
> print ++n, $1 # output first column
> a[n] = $2 # save second column in array
> }
> END {
> j = n + 1 # j is next counter
> for (i=1;i<=n;i++) # loop 1 - n
> print j++, a[i] # output j and array value
> }
> ' file.txt
1 GGCTGCAGCTAACAGGTGA
2 CCTCTGGCTCGCAGGTCATGGC
3 GCTGCAGCTACATGGTGTCG
4 TACTCGGGGAGCTGCGG
5 CAGCGTCTTGCGCTCCT
6 CGCTCCGCTTCTCTCTACG
Or as a 1-liner:
$ awk '{print ++n, $1; a[n]=$2} END {j=n+1; for (i=1;i<=n;i++) print j++, a[i]}' file.txt
1 GGCTGCAGCTAACAGGTGA
2 CCTCTGGCTCGCAGGTCATGGC
3 GCTGCAGCTACATGGTGTCG
4 TACTCGGGGAGCTGCGG
5 CAGCGTCTTGCGCTCCT
6 CGCTCCGCTTCTCTCTACG
If you would like to create an awk script from the above, you can simply create the script file (say cmbcols.awk) as:
{
print ++n, $1 # output first column
a[n] = $2 # save second column in array
}
END {
j = n + 1 # j is next counter
for (i=1;i<=n;i++) # loop 1 - n
print j++, a[i] # output j and array value
}
Then to run the script on the file file.txt you can do:
$ awk -f cmbcols.awk file.txt
1 GGCTGCAGCTAACAGGTGA
2 CCTCTGGCTCGCAGGTCATGGC
3 GCTGCAGCTACATGGTGTCG
4 TACTCGGGGAGCTGCGG
5 CAGCGTCTTGCGCTCCT
6 CGCTCCGCTTCTCTCTACG
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
