'How to replace part of string to something else using awk?
Hello stackoverflow community,
I am new in using awk and wanted to ask the following question:
I have a file that looks like this with 12 columns and ~7000000 rows
CHROM POS ID REF ALT A1 TEST OBS_CT BETA SE T_STAT P
23 154918459 X:154918459:C:G C G G ADD 1460 0.067883 0.039459 1.72034 0.0855842
1 54712 1:54712 TTTTC T ADD 1460 0.00428077 0.0561095 0.0762931 0.939196
1 825069 rs4475692 G C G ADD 1460 -0.000411661 0.0413083 -0.00996558 0.99205
1 825410 rs13303179 G A G ADD 1460 0.00489633 0.041967 0.116671 0.907137
23 154927183 X:154927183:C:T C T T ADD 1460 0.0717408 0.080978 0.885931 0.375803
Column three has different formats e.g. rs509981:154925045:C:T, X:154927183:C:T or 23:57937183:C:T
I only want to change the occurrence of X: into 23: Therefore, for my example, the output should look like this:
I tried the command below but it didn't work. I suspect because the command is finding occurrences of X: as a whole character rather than as part of a string. Btw I am not sure whether string is the correct word.
CHROM POS ID REF ALT A1 TEST OBS_CT BETA SE T_STAT P
23 154918459 23:154918459:C:G C G G ADD 1460 0.067883 0.039459 1.72034 0.0855842
1 54712 1:54712 TTTTC T ADD 1460 0.00428077 0.0561095 0.0762931 0.939196
1 825069 rs4475692 G C G ADD 1460 -0.000411661 0.0413083 -0.00996558 0.99205
1 825410 rs13303179 G A G ADD 1460 0.00489633 0.041967 0.116671 0.907137
23 154927183 23:154927183:C:T C T T ADD 1460 0.0717408 0.080978 0.885931 0.375803
awk 'NR > 1 && $3=="X:" {sub(/^X/,"23:")}1' file.txt > file2.txt
Any help will be greatly appreciated.
Avni.
Solution 1:[1]
Using sed
$ sed '1!s/\([^ ]* \)\{2\}X\(.*\)/\123\2/' input_file
CHROM POS ID REF ALT A1 TEST OBS_CT BETA SE T_STAT P
23 154918459 23:154918459:C:G C G G ADD 1460 0.067883 0.039459 1.72034 0.0855842
1 54712 1:54712 TTTTC T ADD 1460 0.00428077 0.0561095 0.0762931 0.939196
1 825069 rs4475692 G C G ADD 1460 -0.000411661 0.0413083 -0.00996558 0.99205
1 825410 rs13303179 G A G ADD 1460 0.00489633 0.041967 0.116671 0.907137
23 154927183 23:154927183:C:T C T T ADD 1460 0.0717408 0.080978 0.885931 0.375803
Solution 2:[2]
Suggesting awk script:
awk '{sub("X:","23:",$3)}1' input.txt
Explanation for awk script:
{sub("X:","23:",$3)} In each line, in 3rd field, Replace first occurrence of X: with 23:
1 Print each line, changed or not.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | HatLess |
| Solution 2 | Dudi Boy |
