'awk how to split and change blank by NA
i have trouble doing some stuff with awk. I want to split a file into 2 files, it's working mostly but i have one last issue:
this is one of my input file :
samplexxx EH Tred GangSTR
dijen006 nofile nofile nofile
dijen006_100 22,30 22,27 19,25
dijen006_75 25,27 29 NA
dijen017 nofile nofile nofile
dijen017_100 75,121 54 24,24
dijen017_75 74,131 72 19,19
dijen081 63,84 32 40,40
dijen081_100 70,115 78 25,41
dijen081_75 79,143 95 24,104
dijen082 47,51 38 15,34
dijen082_100 46,61 52 6,32
dijen082_75 NA 55 17,17
dijen083 30,53 30,40 38,38
dijen083_100 43,53 30,59 23,32
dijen083_75 43,60 18,74 23,71
dijen1013 30 30 20,30
dijen1013_100 30 30 9,19
dijen1013_75 21 33 20,20
dijen1014 9,30 9,30 9,30
dijen1014_100 9,28 9,43 9,11
dijen1014_75 9,28 9,36 9,29
dijen1015 23,30 23,30 23,29
dijen1015_100 23,30 NA 13,22
dijen1015_75 25,27 21,42 22,39
dijen402 25,31 25,31 25,31
dijen402_100 30 29,36 14,30
dijen402_75 25,26 22,39 22,39
i am using this code :
#!/bin/awk -f
#USAGE = awk -v my_var=$ibasename $i .tsv) split_file_allelle.awk $i
BEGIN { FS=OFS="\t" }
NR == 1 {
str1 = str2 = $0
}
NR > 1 {
str1 = str2 = $1
for (i=2; i<=NF; i++) {
split($i,a,/,/)
str1 = str1 OFS a[1]
str2 = str2 OFS a[2]
}
}
{
print str1 > my_var"_all1.tsv"
print str2 > my_var"_all2.tsv"
}
and i have two file, one like that, splited on the ",". Do you think it would be a way to get, on the second file where there is no number, something like 'NA' instead of blank?
samplexxx EH Tred GangSTR
dijen006
dijen006_100 30 27 25
dijen006_75 27
dijen017
dijen017_100 121 24
dijen017_75 131 19
dijen081 84 40
dijen081_100 115 41
dijen081_75 143 104
dijen082 51 34
dijen082_100 61 32
dijen082_75 17
dijen083 53 40 38
dijen083_100 53 59 32
dijen083_75 60 74 71
dijen1013 30
dijen1013_100 19
dijen1013_75 20
dijen1014 30 30 30
dijen1014_100 28 43 11
dijen1014_75 28 36 29
dijen1015 30 30 29
dijen1015_100 30 22
dijen1015_75 27 42 39
dijen402 31 31 31
dijen402_100 36 30
dijen402_75 26 39 39
this is what i have, but i would like to have something like that :
samplexxx EH Tred GangSTR
dijen006 NA NA NA
dijen006_100 30 27 25
dijen006_75 27 NA NA
dijen017 NA NA NA
dijen017_100 121 NA 24
....
thanks for your help!
Solution 1:[1]
BEGIN {
FS = OFS = "\t"
all1 = my_var "_all1.tsv"
all2 = my_var "_all2.tsv"
}
NR == 1 {
str1 = str2 = $0
}
NR > 1 {
str1 = str2 = $1
for (i=2; i<=NF; i++) {
n = split($i,a,",")
str1 = str1 OFS a[1]
str2 = str2 OFS (n == 1 ? "NA" : a[2])
}
}
{
print str1 > all1
print str2 > all2
}
It wasn't necessary to change print str1 > my_var"_all1.tsv" to print str1 > all1 to solve the specific problem you asked about, the ternary using the test of split()s return does that, BUT print str1 > my_var"_all1.tsv" is undefined behavior per POSIX so it'd fail in some awks and instead needs to be written using a variable as I have or with parens around the expression that generates the file name, print str1 > (my_var"_all1.tsv"). Using a variable and doing the concatenation once total instead of once per line is more efficient.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
