'How to replace 00 with Na excluding first row & first column using bash in comma separated file
I'm working with GWAS data, My data looks like this:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,00,AG,GT,AK,00
32,AG,GG,AA,00,AT
100,TT,AA,00,AG,AA
3,GG,AG,00,GT,GG
Desired Output:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA
3,GG,AG,N/A,GT,GG
Here I'm trying to replace "00" with "N/A", but since I have 00 in the first row & First Column (IID), the command I used:
sed '1!s~00~N/A~g' allSNIPsFinaldata.csv
The above command excludes the first row but not the first column as a result I got IID Values 100, 200, and 300 as 1N/A, 2N/A, and 3N/A. Can anyone please help "how to exclude the first row & First Column as well and perform the above operation. please help
Solution 1:[1]
Using any awk in any shell on every Unix box:
$ awk '{$0=$0","; gsub(/,00,/,",N/A,"); sub(/,$/,"")} 1' file
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA
3,GG,AG,N/A,GT,GG
The above assumes none of the column names on the first row will be the exact string 00. If they can be then just tweak the above to:
awk 'NR>1{$0=$0","; gsub(/,00,/,",N/A,"); sub(/,$/,"")} 1' file
Solution 2:[2]
With your shown samples in GNU awk using its gensub function, please try following awk program.
awk '
BEGIN{
FS=OFS=","
}
FNR==1{
print
next
}
{
secondPart=gensub(/^[^,]*,(.*)/,"\\1","g")
sub(/^00,/,"N/A,",secondPart)
gsub(/,00,/,",N/A,",secondPart)
sub(/,00$/,",N/A",secondPart)
print $1 OFS secondPart
}
' Input_file
Solution 3:[3]
An awk:
$ awk '
BEGIN {
FS=OFS="," # set field delimiters to a comma
}
FNR>1 { # process records after the first
for(i=1;i<=NF;i++) # iterate all fields (maybe start from 2nd?)
if($i=="00") # if field is 00
$i="N/A" # replace
}1' file # output
Output:
IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
100,TT,AA,N/A,AG,AA
3,GG,AG,N/A,GT,GG
Solution 4:[4]
Assuming the columns are separated by space characters such as whitespace
or tab character, would you please try:
sed -E '1!s~([[:space:]])00([[:space:]]|$)~\1N/A\2~g' allSNIPsFinaldata.csv
- The address
1!skips the 1st row. - The regex
([[:space:]])00([[:space:]]|$)matches the00string preceded by a space character (it prevents to match the 1st column) and followed by a space character or the end of the line.
Solution 5:[5]
This might work for you (GNU sed):
sed -E '1!{s/,00(,|$)/,N\/A\1/g;s//,N\/A\1/g}' file
If not the first line and , followed by 00 followed by , or end-of-line, replace the 00 by N/A and other parts of the match remain unchanged.
This substitution is global but needs to be implemented twice because the patterns may overlap.
Solution 6:[6]
If you want to repolace only 00 in other columns you have to add a delimiter (I am assuming space in my command) in your pattern:
sed -i 's~ 00 ~ N/A ~g' allSNIPsFinaldata.csv
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ed Morton |
| Solution 2 | |
| Solution 3 | James Brown |
| Solution 4 | tshiono |
| Solution 5 | |
| Solution 6 | Egor Lipchinskiy |
