'AWK: print column variable with each character separated by a space

I have a very large file like so:

ID      Class     Values
126       1       332222330442022...
753       1       332222330442022...
119       1       402224220402022...
830       1       002233440232022...
944       1       222222220002022...

The 3rd columns is a string with 50,000 characters. I need to ignore the top line, drop the 2nd column, replace all 3 or 4's in the 3rd colum with 1's and finally print the 3rd column with every charcater seperated by a space.

So the desired output is:

126    1 1 2 2 2 2 1 1 0 1 1 2 0 2 2...
753    1 1 2 2 2 2 1 1 0 1 1 2 0 2 2...
119    1 0 2 2 2 1 2 2 0 1 0 2 0 2 2...
830    0 0 2 2 1 1 1 1 0 2 1 2 0 2 2...
944    2 2 2 2 2 2 2 2 0 0 0 2 0 2 2...

Because the file is so large, it would be good to avoid using split on the 3rd column if possible.

So far, I can achieve everything except printing the 3rd column seperated by a space with the following:

awk -F " " 'NR!= 1 { gsub(3,1,$3); gsub(4,1,$3); printf "%s\t%s\n", $1, $3 }' ./input.txt

I know I can use split() similar to the answer here (Split tab delimited column with space) but I need to print $1 also. Is it possible to separate the 3rd column in the same awk command?



Solution 1:[1]

You may use this awk:

awk -v OFS='\t' 'NR > 1 {
   gsub(/[34]/, 1, $3)
   gsub(/./, "& ", $3)
   sub(/ $/, "", $3)
   print $1, $3
}' file

126    1 1 2 2 2 2 1 1 0 1 1 2 0 2 2
753    1 1 2 2 2 2 1 1 0 1 1 2 0 2 2
119    1 0 2 2 2 1 2 2 0 1 0 2 0 2 2
830    0 0 2 2 1 1 1 1 0 2 1 2 0 2 2
944    2 2 2 2 2 2 2 2 0 0 0 2 0 2 2

Solution 2:[2]

With your shown samples, please try following awk code. Written and tested in GNU awk.

awk '
BEGIN{
  FS=OFS="\t"
}
FNR>1{
  val=""
  gsub(/[34]/, 1, $3)
  num=split($3,arr,"")
  for(i=1;i<=num;i++){
    val=(val?val OFS:"") arr[i]
  }
  print $1,val
}
'  Input_file

Explanation: Adding detailed explanation for above code.

awk '                              ##Starting awk program from here.
BEGIN{                             ##Starting BEGIN section of awk program from here.
  FS=OFS="\t"                      ##Setting FS and OFS as tab here.
}
FNR>1{                             ##If line is not first line then do following.
  val=""                           ##Nullifying val here.
  gsub(/[34]/, 1, $3)              ##Globally substituting 3 4 to 1 in $3 here.
  num=split($3,arr,"")             ##Splitting $3 into array arr with NULL delimiter.
  for(i=1;i<=num;i++){             ##Running for loop from 1 to till value of num.
    val=(val?val OFS:"") arr[i]    ##Creating val which has all elements of arr added with spaces.
  }
  print $1,val                     ##Printing $1 and val here.
}
'  Input_file                      ##Mentioning Input_file name here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 anubhava
Solution 2 RavinderSingh13