'How to shift a specific cell left in a text file
I have a very large text file (tab-delimited, first line is header) like this:
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD AC 150 2367 0.02 123 I
FA AFQ ASB 123 2473 0.4 630 I
As you can see, there are two strings in column 3 at the lines 3 and 4 columns 3 (A3). Could you please help me out with how I can delete these strings and shift cells left using awk, sed or any Linux codes to have the corrected file like:
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
I tried:
awk 'if($3!~/[0-9]+/') $3=$4}1', file
It removes any strings in column 3 and replaces them with column 4, but without shifting cells left.
Solution 1:[1]
Using sed
$ sed '1!s/^\([^ ]* \+[^ ]* \+\)[A-Z][^ ]* \+/\1/' input_file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
1!
- Do not match line 1
^\([^ ]* \+[^ ]* \+\)
- Using backreferencing, we can store to memory the contents of a parenthesis, this will match everything up to the second space.
[A-Z][^ ]* \+
- Anything not within the parenthesis will be excluded from the matched return. If the third column contains capital alphabetic characters, then evrything up to the next space is excluded.
\1
- Return anything captured within the parenthesis
Solution 2:[2]
You may use this awk
:
awk 'BEGIN{FS=OFS="\t"} NR > 1 && $3+0 != $3 {
$3 = ""; sub(FS FS, FS)} 1' file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
Solution 3:[3]
This might work for you (GNU sed):
sed -E '1!s/^((\S+\s+){2})[A-Z]\S+\s+/\1/' file
Remove the third field and the space(s) following if it begins with a character A
through Z
.
Solution 4:[4]
$ awk -F'\t+' -v OFS='\t' 'NF>7{$3=""; $0=$0; $1=$1} 1' file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
$ awk -v OFS='\t' '{print $1, $2, $(NF-4), $(NF-3), $(NF-2), $(NF-1), $NF}' file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
Solution 5:[5]
Using cppawk:
cppawk '
#include <cons.h> // numberp lives here
#include <field.h> // delf here
NR > 1 && !numberp($3) { delf(3); } 1' file
A1 A2 A3 A4 A5 A6 A7
FA1 AB 234 231 0.02 456 I
FA2 ACE 241 2154 0.1 324 O
FA3 AD 150 2367 0.02 123 I
FA AFQ 123 2473 0.4 630 I
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | anubhava |
Solution 3 | |
Solution 4 | |
Solution 5 | Kaz |