'How to shift a specific cell left in a text file

I have a very large text file (tab-delimited, first line is header) like this:

A1  A2  A3  A4  A5  A6  A7  
FA1 AB  234 231 0.02    456 I   
FA2 ACE 241 2154    0.1 324 O   
FA3 AD  AC  150 2367    0.02    123 I
FA  AFQ ASB 123 2473    0.4 630 I

As you can see, there are two strings in column 3 at the lines 3 and 4 columns 3 (A3). Could you please help me out with how I can delete these strings and shift cells left using awk, sed or any Linux codes to have the corrected file like:

A1  A2  A3  A4  A5  A6  A7
FA1 AB  234 231 0.02    456 I
FA2 ACE 241 2154    0.1 324 O
FA3 AD  150 2367    0.02    123 I
FA  AFQ 123 2473    0.4 630 I

I tried:

awk 'if($3!~/[0-9]+/') $3=$4}1', file

It removes any strings in column 3 and replaces them with column 4, but without shifting cells left.



Solution 1:[1]

Using sed

$ sed '1!s/^\([^ ]* \+[^ ]* \+\)[A-Z][^ ]* \+/\1/' input_file
A1  A2  A3  A4  A5  A6  A7
FA1 AB  234 231 0.02    456 I
FA2 ACE 241 2154    0.1 324 O
FA3 AD  150 2367    0.02    123 I
FA  AFQ 123 2473    0.4 630 I

1! - Do not match line 1

^\([^ ]* \+[^ ]* \+\) - Using backreferencing, we can store to memory the contents of a parenthesis, this will match everything up to the second space.

[A-Z][^ ]* \+ - Anything not within the parenthesis will be excluded from the matched return. If the third column contains capital alphabetic characters, then evrything up to the next space is excluded.

\1 - Return anything captured within the parenthesis

Solution 2:[2]

You may use this awk:

awk 'BEGIN{FS=OFS="\t"} NR > 1 && $3+0 != $3 {
$3 = ""; sub(FS FS, FS)} 1' file

A1   A2   A3   A4    A5    A6   A7
FA1  AB   234  231   0.02  456  I
FA2  ACE  241  2154  0.1   324  O
FA3  AD   150  2367  0.02  123  I
FA   AFQ  123  2473  0.4   630  I

Solution 3:[3]

This might work for you (GNU sed):

sed -E '1!s/^((\S+\s+){2})[A-Z]\S+\s+/\1/' file

Remove the third field and the space(s) following if it begins with a character A through Z.

Solution 4:[4]

$ awk -F'\t+' -v OFS='\t' 'NF>7{$3=""; $0=$0; $1=$1} 1' file
A1      A2      A3      A4      A5      A6      A7
FA1     AB      234     231     0.02    456     I
FA2     ACE     241     2154    0.1     324     O
FA3     AD      150     2367    0.02    123     I
FA      AFQ     123     2473    0.4     630     I

$ awk -v OFS='\t' '{print $1, $2, $(NF-4), $(NF-3), $(NF-2), $(NF-1), $NF}' file
A1      A2      A3      A4      A5      A6      A7
FA1     AB      234     231     0.02    456     I
FA2     ACE     241     2154    0.1     324     O
FA3     AD      150     2367    0.02    123     I
FA      AFQ     123     2473    0.4     630     I

Solution 5:[5]

Using cppawk:

cppawk '
#include <cons.h>  // numberp lives here
#include <field.h> // delf here
NR > 1 && !numberp($3) { delf(3); } 1' file
A1  A2  A3  A4  A5  A6  A7  
FA1 AB  234 231 0.02    456 I   
FA2 ACE 241 2154    0.1 324 O   
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 anubhava
Solution 3
Solution 4
Solution 5 Kaz