'AWK: print column variable with each character separated by a space
I have a very large file like so:
ID Class Values
126 1 332222330442022...
753 1 332222330442022...
119 1 402224220402022...
830 1 002233440232022...
944 1 222222220002022...
The 3rd columns is a string with 50,000 characters. I need to ignore the top line, drop the 2nd column, replace all 3 or 4's in the 3rd colum with 1's and finally print the 3rd column with every charcater seperated by a space.
So the desired output is:
126 1 1 2 2 2 2 1 1 0 1 1 2 0 2 2...
753 1 1 2 2 2 2 1 1 0 1 1 2 0 2 2...
119 1 0 2 2 2 1 2 2 0 1 0 2 0 2 2...
830 0 0 2 2 1 1 1 1 0 2 1 2 0 2 2...
944 2 2 2 2 2 2 2 2 0 0 0 2 0 2 2...
Because the file is so large, it would be good to avoid using split on the 3rd column if possible.
So far, I can achieve everything except printing the 3rd column seperated by a space with the following:
awk -F " " 'NR!= 1 { gsub(3,1,$3); gsub(4,1,$3); printf "%s\t%s\n", $1, $3 }' ./input.txt
I know I can use split() similar to the answer here (Split tab delimited column with space) but I need to print $1 also. Is it possible to separate the 3rd column in the same awk command?
Solution 1:[1]
You may use this awk:
awk -v OFS='\t' 'NR > 1 {
gsub(/[34]/, 1, $3)
gsub(/./, "& ", $3)
sub(/ $/, "", $3)
print $1, $3
}' file
126 1 1 2 2 2 2 1 1 0 1 1 2 0 2 2
753 1 1 2 2 2 2 1 1 0 1 1 2 0 2 2
119 1 0 2 2 2 1 2 2 0 1 0 2 0 2 2
830 0 0 2 2 1 1 1 1 0 2 1 2 0 2 2
944 2 2 2 2 2 2 2 2 0 0 0 2 0 2 2
Solution 2:[2]
With your shown samples, please try following awk code. Written and tested in GNU awk.
awk '
BEGIN{
FS=OFS="\t"
}
FNR>1{
val=""
gsub(/[34]/, 1, $3)
num=split($3,arr,"")
for(i=1;i<=num;i++){
val=(val?val OFS:"") arr[i]
}
print $1,val
}
' Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of awk program from here.
FS=OFS="\t" ##Setting FS and OFS as tab here.
}
FNR>1{ ##If line is not first line then do following.
val="" ##Nullifying val here.
gsub(/[34]/, 1, $3) ##Globally substituting 3 4 to 1 in $3 here.
num=split($3,arr,"") ##Splitting $3 into array arr with NULL delimiter.
for(i=1;i<=num;i++){ ##Running for loop from 1 to till value of num.
val=(val?val OFS:"") arr[i] ##Creating val which has all elements of arr added with spaces.
}
print $1,val ##Printing $1 and val here.
}
' Input_file ##Mentioning Input_file name here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | anubhava |
| Solution 2 | RavinderSingh13 |
