'How to sort a specified column in l、Linux

This is my two column sequence, I want to combine them into 1 column and sort them in Linux, but I don't know how to write the shell script to handle them.

    GGCTGCAGCTAACAGGTGA         TACTCGGGGAGCTGCGG
    CCTCTGGCTCGCAGGTCATGGC      CAGCGTCTTGCGCTCCT
    GCTGCAGCTACATGGTGTCG        CGCTCCGCTTCTCTCTACG

The sorted results are as follows (first column first, second column second, and split by "\t")

1 GGCTGCAGCTAACAGGTGA         
2 CCTCTGGCTCGCAGGTCATGGC      
3 GCTGCAGCTACATGGTGTCG        
4 TACTCGGGGAGCTGCGG
5 CAGCGTCTTGCGCTCCT
6 CGCTCCGCTTCTCTCTACG

what should I do?



Solution 1:[1]

You can do it easily in awk by storing the second column in an array and then outputting the saved values in the END rule, e.g.

awk '
  {
    print ++n, $1       # output first column
    a[n] = $2           # save second column in array
  } 
  END {
    j = n + 1           # j is next counter
    for (i=1;i<=n;i++)  # loop 1 - n
      print j++, a[i]   # output j and array value
  }
' file.txt

Example Use/Output

With your input in file.txt, you can just copy/middle-mouse-paste the above in an xterm with file.txt in the current directory, e.g.

$ awk '
>   {
>     print ++n, $1       # output first column
>     a[n] = $2           # save second column in array
>   }
>   END {
>     j = n + 1           # j is next counter
>     for (i=1;i<=n;i++)  # loop 1 - n
>       print j++, a[i]   # output j and array value
>   }
> ' file.txt
1 GGCTGCAGCTAACAGGTGA
2 CCTCTGGCTCGCAGGTCATGGC
3 GCTGCAGCTACATGGTGTCG
4 TACTCGGGGAGCTGCGG
5 CAGCGTCTTGCGCTCCT
6 CGCTCCGCTTCTCTCTACG

Or as a 1-liner:

$ awk '{print ++n, $1; a[n]=$2} END {j=n+1; for (i=1;i<=n;i++) print j++, a[i]}' file.txt
1 GGCTGCAGCTAACAGGTGA
2 CCTCTGGCTCGCAGGTCATGGC
3 GCTGCAGCTACATGGTGTCG
4 TACTCGGGGAGCTGCGG
5 CAGCGTCTTGCGCTCCT
6 CGCTCCGCTTCTCTCTACG

If you would like to create an awk script from the above, you can simply create the script file (say cmbcols.awk) as:

{
  print ++n, $1       # output first column
  a[n] = $2           # save second column in array
}
END {
  j = n + 1           # j is next counter
  for (i=1;i<=n;i++)  # loop 1 - n
    print j++, a[i]   # output j and array value
}

Then to run the script on the file file.txt you can do:

$ awk -f cmbcols.awk file.txt
1 GGCTGCAGCTAACAGGTGA
2 CCTCTGGCTCGCAGGTCATGGC
3 GCTGCAGCTACATGGTGTCG
4 TACTCGGGGAGCTGCGG
5 CAGCGTCTTGCGCTCCT
6 CGCTCCGCTTCTCTCTACG

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1