'create the column and rename the files of second column to new column in linux?

Example data:

cat lookup.tsv
SRR7015874_1.fastq
SRR7015874_2.fastq
SRR7015875_1.fastq
SRR7015875_2.fastq
SRR7015876_1.fastq
SRR7015876_2.fastq
SRR7015877_1.fastq
SRR7015877_2.fastq

Using this command:

awk '{print $1 "\t" "SRR\_" NR ".fastq"}' lookup.tsv > lookup_table.tsv

I get two columns:

SRR7015874_1.fastq   SRR_1.fastq
SRR7015874_2.fastq   SRR_2.fastq
SRR7015875_1.fastq   SRR_3.fastq
SRR7015875_2.fastq   SRR_4.fastq
SRR7015876_1.fastq   SRR_5.fastq
SRR7015876_2.fastq   SRR_6.fastq
SRR7015877_1.fastq   SRR_7.fastq
SRR7015877_2.fastq   SRR_8.fastq

Now I want to create third column, like this:

SRR1_1.fastq
SRR1_2.fastq
SRR2_1.fastq
SRR2_2.fastq
SRR3_1.fastq
SRR3_2.fastq
SRR4_1.fastq
SRR4_2.fastq

And I want to use the second and third columns to rename files (i.e. if the filename = $2, change it to $3)

I tried:

cat lookup_table.tsv | while read c1 c2; do mv $c1 $c2 ; done
SRR1_1.fastq
SRR1_2.fastq
SRR2_1.fastq
SRR2_2.fastq
SRR3_1.fastq
SRR3_2.fastq

But this was not successful. Is there an error in my code/approach?



Solution 1:[1]

Does this solve your problem?

awk '{print $1 "\t" "SRR_" NR ".fastq"}' lookup.tsv > tmp
awk 'END{for (i=1; i<=4; i++) for (j=1; j<=2; j++) print "SRR" i "_" j ".fastq"}' tmp > third_column.txt
paste tmp third_column.txt > lookup_table.txt
cat lookup_table.txt
SRR7015874_1.fastq  SRR_1.fastq SRR1_1.fastq
SRR7015874_2.fastq  SRR_2.fastq SRR1_2.fastq
SRR7015875_1.fastq  SRR_3.fastq SRR2_1.fastq
SRR7015875_2.fastq  SRR_4.fastq SRR2_2.fastq
SRR7015876_1.fastq  SRR_5.fastq SRR3_1.fastq
SRR7015876_2.fastq  SRR_6.fastq SRR3_2.fastq
SRR7015877_1.fastq  SRR_7.fastq SRR4_1.fastq
SRR7015877_2.fastq  SRR_8.fastq SRR4_2.fastq

while read -r c1 c2 c3; do mv "$c2" "$c3"; done < lookup_table.txt

Solution 2:[2]

You could get the data for the third column using the NR and the modulo to increment i every 2 lines, and another variable j which is either 1 or 2.

awk '{
  if (NR % 2 == 1) {++i; j=1} else {j=2}
  print $1 "\tSRR_" NR ".fastq\tSSR" i "_" j ".fastq"
}' lookup.tsv > lookup_table.tsv

The content in the file lookup_table.tsv is

SRR7015874_1.fastq  SRR_1.fastq SRR1_1.fastq
SRR7015874_2.fastq  SRR_2.fastq SRR1_2.fastq
SRR7015875_1.fastq  SRR_3.fastq SRR2_1.fastq
SRR7015875_2.fastq  SRR_4.fastq SRR2_2.fastq
SRR7015876_1.fastq  SRR_5.fastq SRR3_1.fastq
SRR7015876_2.fastq  SRR_6.fastq SRR3_2.fastq
SRR7015877_1.fastq  SRR_7.fastq SRR4_1.fastq
SRR7015877_2.fastq  SRR_8.fastq SRR4_2.fastq

To rename the files:

while read c1 c2 c3; do mv "$c2" "$c3"; done < lookup_table.tsv

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jared_mamrot
Solution 2