'Insert pattern from current line in next line

I have this file :

>AX-899-Af-889-[A/G]
GTCCATTCAGGTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
>AX-899-Af-889-[A/G]
GCAAACTATTTTCATGAATGAACTTCAGTTGATTGTGAGATG
>AX-899-Af-889-[G/T]
AAGGTAGAATGACACCATTAAACAGTAGGGAATTGGTCACAGAACTCT

I need to insert the pattern [X/X] present in the lines starting by > in the next line at the 10th position and replace this 10th character :

>AX-899-Af-889-[A/G]
GTCCATTCA[A/G]GTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
>AX-899-Af-889-[A/G]
GCAAACTAT[A/G]TTCATGAATGAACTTCAGTTGATTGTGAGATG
>AX-899-Af-889-[G/T]
AAGGTAGAA[G/T]GACACCATTAAACAGTAGGGAATTGGTCACAGAACTCT

I can extract the pattern :

awk  'match($0, /^>/) {split($0,a,"-");  print; getline; print a[5]}1' file 

Also replace the 10th character by a pattern ("N" for example) : sed 's/^\([ATCG].\{8\}\)[ATCG]/\1N/' file

awk


Solution 1:[1]

With your shown samples, please try following awk.

awk '
BEGIN{ FS=OFS="-" }
/^>/ {
  val=$NF
  print
  next
}
{
  print substr($0,1,9) val substr($0,11)
  val=""
}
'  Input_file

Explanation: Adding detailed explanation for above.

awk '                      ##Starting awk program from here.
BEGIN{ FS=OFS="-" }        ##Starting BEGIN section from here and setting FS and OFS as - here.
/^>/ {                     ##Checking condition if line starts from > then do following.
  val=$NF                  ##Setting last field($NF) to val here.
  print                    ##printing current line here.
  next                     ##next will skip all further statements from here.
}
{
  print substr($0,1,9) val substr($0,11)  ##printing substring from 1st to 9 chars of current line.
                           ##Followed by val and rest of values from 11th char to till last of current line.
  val=""                   ##Nullifying val here.
}
'  Input_file              ##Mentioning Input_file name here. 

Solution 2:[2]

Another:

$ awk '
BEGIN { FS=OFS="" }               # each char is a field of its own
{
    if(/^>/)                      # if record starts with a >
        b=substr($0,length-4,5)   # get last 5 chars to buffer
    else                          # otherwise
        $10=b                     # replace 10th char with buffer
}1' file                          # output

Some output:

>AX-899-Af-889-[A/G]
GTCCATTCA[A/G]GTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
...

Solution 3:[3]

Using sed

$ cat sed.script
/^>/{                          #If the line starts with >
    p                          #Print it to create a duplicate line
    s/[^[]*\([^]]*]\)/\1/      #Using back referencing, extract the pattern at the end
    h                          #Store the pattern in hold space
    d                          #Now stored in hold space, delete the duplicated line.
} 
{
    G                          #Append the contents of the hold space to that of the pattern space.
    s/\n//                     #Remove the newline created by previous command
    s/\(.\{9\}\).\([^[]*\)\(.*\)/\1\3\2/ #Replace 10th character with the content obtained from the hold space 
}
$ sed -f sed.script input_file
>AX-899-Af-889-[A/G]
GTCCATTCA[A/G]GTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
>AX-899-Af-889-[A/G]
GCAAACTAT[A/G]TTCATGAATGAACTTCAGTTGATTGTGAGATG
>AX-899-Af-889-[G/T]
AAGGTAGAA[G/T]GACACCATTAAACAGTAGGGAATTGGTCACAGAACTCT

Or as a one liner

$ sed '/^>/{p;s/[^[]*\([^]]*]\)/\1/;h;d};{G;s/\n//;s/\(.\{9\}\).\([^[]*\)\(.*\)/\1\3\2/}' input_file

Solution 4:[4]

Another idea using sed:

sed -E '/^>/{N;s/(.*-)(\[[^][]*])(\n.{9})./\1\2\3\2/}' file

Explanation

  • /^>/ If the line starts with >
  • N Append the next line to the pattern space
  • (.*-) Capture group 1, match till the last occurrence of -
  • (\[[^][]*]) Capture group 2, match from opening to closing square brackets [...]
  • (\n.{9}). Capture a newline and 9 characters in group 3 and match the 10th character
  • \1\2\n\3\2 The replacement using the backreferences to the capture groups including newline

Output

>AX-899-Af-889-[A/G]
GTCCATTCA[A/G]GTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
>AX-899-Af-889-[A/G]
GCAAACTAT[A/G]TTCATGAATGAACTTCAGTTGATTGTGAGATG
>AX-899-Af-889-[G/T]
AAGGTAGAA[G/T]GACACCATTAAACAGTAGGGAATTGGTCACAGAACTCT

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 James Brown
Solution 3
Solution 4 The fourth bird