'How to replace part of string to something else using awk?

Hello stackoverflow community,

I am new in using awk and wanted to ask the following question:

I have a file that looks like this with 12 columns and ~7000000 rows

CHROM   POS ID  REF ALT A1  TEST    OBS_CT  BETA    SE  T_STAT  P
23  154918459   X:154918459:C:G C   G   G   ADD 1460    0.067883    0.039459    1.72034 0.0855842
1   54712   1:54712 TTTTC   T   ADD 1460    0.00428077  0.0561095   0.0762931   0.939196
1   825069  rs4475692   G   C   G   ADD 1460    -0.000411661    0.0413083   -0.00996558 0.99205
1   825410  rs13303179  G   A   G   ADD 1460    0.00489633  0.041967    0.116671    0.907137
23  154927183   X:154927183:C:T C   T   T   ADD 1460    0.0717408   0.080978    0.885931    0.375803

Column three has different formats e.g. rs509981:154925045:C:T, X:154927183:C:T or 23:57937183:C:T

I only want to change the occurrence of X: into 23: Therefore, for my example, the output should look like this:

I tried the command below but it didn't work. I suspect because the command is finding occurrences of X: as a whole character rather than as part of a string. Btw I am not sure whether string is the correct word.

CHROM   POS ID  REF ALT A1  TEST    OBS_CT  BETA    SE  T_STAT  P
23  154918459   23:154918459:C:G    C   G   G   ADD 1460    0.067883    0.039459    1.72034 0.0855842
1   54712   1:54712 TTTTC   T   ADD 1460    0.00428077  0.0561095   0.0762931   0.939196
1   825069  rs4475692   G   C   G   ADD 1460    -0.000411661    0.0413083   -0.00996558 0.99205
1   825410  rs13303179  G   A   G   ADD 1460    0.00489633  0.041967    0.116671    0.907137
23  154927183   23:154927183:C:T    C   T   T   ADD 1460    0.0717408   0.080978    0.885931    0.375803
awk 'NR > 1 && $3=="X:" {sub(/^X/,"23:")}1' file.txt > file2.txt

Any help will be greatly appreciated.

Avni.



Solution 1:[1]

Using sed

$ sed '1!s/\([^ ]* \)\{2\}X\(.*\)/\123\2/' input_file
CHROM   POS ID  REF ALT A1  TEST    OBS_CT  BETA    SE  T_STAT  P
23  154918459  23:154918459:C:G C   G   G   ADD 1460    0.067883    0.039459    1.72034 0.0855842
1   54712   1:54712 TTTTC   T   ADD 1460    0.00428077  0.0561095   0.0762931   0.939196
1   825069  rs4475692   G   C   G   ADD 1460    -0.000411661    0.0413083   -0.00996558 0.99205
1   825410  rs13303179  G   A   G   ADD 1460    0.00489633  0.041967    0.116671    0.907137
23  154927183  23:154927183:C:T C   T   T   ADD 1460    0.0717408   0.080978    0.885931    0.375803

Solution 2:[2]

Suggesting awk script:

awk '{sub("X:","23:",$3)}1' input.txt

Explanation for awk script:

{sub("X:","23:",$3)} In each line, in 3rd field, Replace first occurrence of X: with 23:

1 Print each line, changed or not.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 HatLess
Solution 2 Dudi Boy