'Insert pattern from current line in next line
I have this file :
>AX-899-Af-889-[A/G]
GTCCATTCAGGTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
>AX-899-Af-889-[A/G]
GCAAACTATTTTCATGAATGAACTTCAGTTGATTGTGAGATG
>AX-899-Af-889-[G/T]
AAGGTAGAATGACACCATTAAACAGTAGGGAATTGGTCACAGAACTCT
I need to insert the pattern [X/X] present in the lines starting by > in the next line at the 10th position and replace this 10th character :
>AX-899-Af-889-[A/G]
GTCCATTCA[A/G]GTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
>AX-899-Af-889-[A/G]
GCAAACTAT[A/G]TTCATGAATGAACTTCAGTTGATTGTGAGATG
>AX-899-Af-889-[G/T]
AAGGTAGAA[G/T]GACACCATTAAACAGTAGGGAATTGGTCACAGAACTCT
I can extract the pattern :
awk 'match($0, /^>/) {split($0,a,"-"); print; getline; print a[5]}1' file
Also replace the 10th character by a pattern ("N" for example) : sed 's/^\([ATCG].\{8\}\)[ATCG]/\1N/' file
Solution 1:[1]
With your shown samples, please try following awk.
awk '
BEGIN{ FS=OFS="-" }
/^>/ {
val=$NF
print
next
}
{
print substr($0,1,9) val substr($0,11)
val=""
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ FS=OFS="-" } ##Starting BEGIN section from here and setting FS and OFS as - here.
/^>/ { ##Checking condition if line starts from > then do following.
val=$NF ##Setting last field($NF) to val here.
print ##printing current line here.
next ##next will skip all further statements from here.
}
{
print substr($0,1,9) val substr($0,11) ##printing substring from 1st to 9 chars of current line.
##Followed by val and rest of values from 11th char to till last of current line.
val="" ##Nullifying val here.
}
' Input_file ##Mentioning Input_file name here.
Solution 2:[2]
Another:
$ awk '
BEGIN { FS=OFS="" } # each char is a field of its own
{
if(/^>/) # if record starts with a >
b=substr($0,length-4,5) # get last 5 chars to buffer
else # otherwise
$10=b # replace 10th char with buffer
}1' file # output
Some output:
>AX-899-Af-889-[A/G]
GTCCATTCA[A/G]GTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
...
Solution 3:[3]
Using sed
$ cat sed.script
/^>/{ #If the line starts with >
p #Print it to create a duplicate line
s/[^[]*\([^]]*]\)/\1/ #Using back referencing, extract the pattern at the end
h #Store the pattern in hold space
d #Now stored in hold space, delete the duplicated line.
}
{
G #Append the contents of the hold space to that of the pattern space.
s/\n// #Remove the newline created by previous command
s/\(.\{9\}\).\([^[]*\)\(.*\)/\1\3\2/ #Replace 10th character with the content obtained from the hold space
}
$ sed -f sed.script input_file
>AX-899-Af-889-[A/G]
GTCCATTCA[A/G]GTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
>AX-899-Af-889-[A/G]
GCAAACTAT[A/G]TTCATGAATGAACTTCAGTTGATTGTGAGATG
>AX-899-Af-889-[G/T]
AAGGTAGAA[G/T]GACACCATTAAACAGTAGGGAATTGGTCACAGAACTCT
Or as a one liner
$ sed '/^>/{p;s/[^[]*\([^]]*]\)/\1/;h;d};{G;s/\n//;s/\(.\{9\}\).\([^[]*\)\(.*\)/\1\3\2/}' input_file
Solution 4:[4]
Another idea using sed:
sed -E '/^>/{N;s/(.*-)(\[[^][]*])(\n.{9})./\1\2\3\2/}' file
Explanation
/^>/If the line starts with>NAppend the next line to the pattern space(.*-)Capture group 1, match till the last occurrence of-(\[[^][]*])Capture group 2, match from opening to closing square brackets[...](\n.{9}).Capture a newline and 9 characters in group 3 and match the 10th character\1\2\n\3\2The replacement using the backreferences to the capture groups including newline
Output
>AX-899-Af-889-[A/G]
GTCCATTCA[A/G]GTAAAAAAAAAAAACATAACAATTGAAATTGCATGA
>AX-899-Af-889-[A/G]
GCAAACTAT[A/G]TTCATGAATGAACTTCAGTTGATTGTGAGATG
>AX-899-Af-889-[G/T]
AAGGTAGAA[G/T]GACACCATTAAACAGTAGGGAATTGGTCACAGAACTCT
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | James Brown |
| Solution 3 | |
| Solution 4 | The fourth bird |
