'Edit only specific lines when I find special character with awk

I have this kind of file :

>AX-89948491-minus
CTAACACATTTAGTAGATT
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT

When the lines start by ">" and include "minus" , I need to reverse (rev) and translate (tr) the next following lines. I should get :

>AX-89948491-minus
AATCTACTAAATGTGTTAG
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT

I would like to go with awk. I tried that but it does not work..

awk '{if(NR%2==1~/"plus"/){print;getline;print} else if (NR%2==1~/"minus"/){system("echo "$0" | rev | tr ATCGatcg TAGCtagc")} else {print;getline;print}}' file

Any help?

awk


Solution 1:[1]

This gnu-awk should work for you:

awk '
p {
   cmd = "rev <<< \047" $0 "\047 | tr ATCGatcg TAGCtagc"
   if ((cmd |& getline var) > 0)
      $0 = var
}
{
   p = /^>/ && /-minus/
} 1' file

>AX-89948491-minus
AATCTACTAAATGTGTTAG
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT

Solution 2:[2]

Awk is a tool to manipulate text, not a tool to sequence calls to other tools. The latter is what a shell is for. There are times when you need to call other tools from awk but not when it's simple text manipulation like reversing and translating characters in a string as you want to do.

Using any awk in any shell on every Unix box without spawning a subshell once per target input line to call other Unix tools (including the non-POSIX-defined rev which won't exist on some Unix boxes):

$ cat tst.awk
BEGIN {
    split("ATCGatcg TAGCtagc",tmp)
    for (i=1; i<=length(tmp[1]); i++) {
        tr[substr(tmp[1],i,1)] = substr(tmp[2],i,1)
    }
}
f {
    out = ""
    for (i=1; i<=length($0); i++) {
        char = substr($0,i,1)
        out = (char in tr ? tr[char] : char) out
    }
    $0 = out
    f = 0
}
/^>.*minus/ { f=1 }
{ print }

$ awk -f tst.awk file
>AX-89948491-minus
AATCTACTAAATGTGTTAG
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT

Solution 3:[3]

I'd use perl, as it has builtin reverse and tr functions:

perl -lpe '
    if (/^>/) {$rev = /minus/; next}
    if ($rev) {$_ = reverse; tr/ATCGatcg/TAGCtagc/}
' file
>AX-89948491-minus
AATCTACTAAATGTGTTAG
>AX-89940152-plus
cgtcattcagggcaggtggggcaaaA
>AX-89922107-plus
TTATAACTTGTGTATGCTCTCAGGCT

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 glenn jackman