'awk how to split and change blank by NA

i have trouble doing some stuff with awk. I want to split a file into 2 files, it's working mostly but i have one last issue:

this is one of my input file :

samplexxx       EH      Tred    GangSTR
dijen006        nofile  nofile  nofile
dijen006_100    22,30   22,27   19,25
dijen006_75     25,27   29      NA
dijen017        nofile  nofile  nofile
dijen017_100    75,121  54      24,24
dijen017_75     74,131  72      19,19
dijen081        63,84   32      40,40
dijen081_100    70,115  78      25,41
dijen081_75     79,143  95      24,104
dijen082        47,51   38      15,34
dijen082_100    46,61   52      6,32
dijen082_75     NA      55      17,17
dijen083        30,53   30,40   38,38
dijen083_100    43,53   30,59   23,32
dijen083_75     43,60   18,74   23,71
dijen1013       30      30      20,30
dijen1013_100   30      30      9,19
dijen1013_75    21      33      20,20
dijen1014       9,30    9,30    9,30
dijen1014_100   9,28    9,43    9,11
dijen1014_75    9,28    9,36    9,29
dijen1015       23,30   23,30   23,29
dijen1015_100   23,30   NA      13,22
dijen1015_75    25,27   21,42   22,39
dijen402        25,31   25,31   25,31
dijen402_100    30      29,36   14,30
dijen402_75     25,26   22,39   22,39

i am using this code :

#!/bin/awk -f
#USAGE = awk -v my_var=$ibasename $i .tsv) split_file_allelle.awk $i

BEGIN { FS=OFS="\t" }
NR == 1 {
    str1 = str2 = $0
}
NR > 1 {
    str1 = str2 = $1
    for (i=2; i<=NF; i++) {
        split($i,a,/,/)
        str1 = str1 OFS a[1]
        str2 = str2 OFS a[2]
    }
}
{
    print str1 > my_var"_all1.tsv"
    print str2 > my_var"_all2.tsv"
}

and i have two file, one like that, splited on the ",". Do you think it would be a way to get, on the second file where there is no number, something like 'NA' instead of blank?

samplexxx       EH      Tred    GangSTR
dijen006                        
dijen006_100    30      27      25
dijen006_75     27              
dijen017                        
dijen017_100    121             24
dijen017_75     131             19
dijen081        84              40
dijen081_100    115             41
dijen081_75     143             104
dijen082        51              34
dijen082_100    61              32
dijen082_75                     17
dijen083        53      40      38
dijen083_100    53      59      32
dijen083_75     60      74      71
dijen1013                       30
dijen1013_100                   19
dijen1013_75                    20
dijen1014       30      30      30
dijen1014_100   28      43      11
dijen1014_75    28      36      29
dijen1015       30      30      29
dijen1015_100   30              22
dijen1015_75    27      42      39
dijen402        31      31      31
dijen402_100            36      30
dijen402_75     26      39      39

this is what i have, but i would like to have something like that :

samplexxx       EH      Tred    GangSTR
dijen006        NA      NA      NA               
dijen006_100    30      27      25
dijen006_75     27      NA      NA   
dijen017        NA      NA      NA          
dijen017_100    121     NA      24
 .... 

thanks for your help!

awk


Solution 1:[1]

BEGIN {
    FS = OFS = "\t"
    all1 = my_var "_all1.tsv"
    all2 = my_var "_all2.tsv"
}
NR == 1 {
    str1 = str2 = $0
}
NR > 1 {
    str1 = str2 = $1
    for (i=2; i<=NF; i++) {
        n = split($i,a,",")
        str1 = str1 OFS a[1]
        str2 = str2 OFS (n == 1 ? "NA" : a[2])
    }
}
{
    print str1 > all1
    print str2 > all2
}

It wasn't necessary to change print str1 > my_var"_all1.tsv" to print str1 > all1 to solve the specific problem you asked about, the ternary using the test of split()s return does that, BUT print str1 > my_var"_all1.tsv" is undefined behavior per POSIX so it'd fail in some awks and instead needs to be written using a variable as I have or with parens around the expression that generates the file name, print str1 > (my_var"_all1.tsv"). Using a variable and doing the concatenation once total instead of once per line is more efficient.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1