'Rearranging cells in a .tsv file

I have a .tsv file which I have attached along with this post. I have rows(cells) in the format of A1,A2,A3...A12 , B1..B2, .... H1..H12. I need to re-arrange this to a format like A1,B1,C1,D1,...H1 , A2,B2,C2,...H2 ..... A12,B12,C12,...H12. I need to do this using Python.

I have another .tsv file that allows me to compare it with this file. It is called flipped.tsv . The flipped.tsv file contains the accurate well values corresponding to the cells. In other words, I must map the well values with their accurate cell-lines.

From what I have understood is that the cell line of the meta-data is incorreclty arranged in column-major whereas it has to be arranged in a row-major format like how it is in flipped.tsv file.

For example : "A2 of flipped_metadata.tsv has the same well values as that of B1 of metadata.tsv."

What is the logic that I can carry out to perform this in Python?

First .tsv file flipped .tsv file

python csv

Solution 1:^[1]

You could do the following:

import csv

# Read original file
with open("file.tsv", "r") as file:
    rows = list(csv.reader(file, delimiter="\t"))

# Key function for sorting
def key_func(row):
    """ Transform row in sort key, e.g. ['A7', 1, 2] -> (7, 'A') """
    return int(row[0][1:]), row[0][0]

# Write `flipped´ file
with open("file_flipped.tsv", "w") as file:
    csv.writer(file, delimiter="\t").writerows(
        row[:1] + flipped[1:]
        for row, flipped in zip(rows, sorted(rows, key=key_func))
    )

The flipping is done by sorting the original rows by

first the integer part of their first row entry int(row[0][1:]), and
then the character part of their first entry row[0][0].

See tio.run illustration here.

If the effect of the sorting isn't obvious, take a look at the result of the same operation, just without the relabelling of the first column:

with open("file_flipped.tsv", "w") as file:
    csv.writer(file, delimiter="\t").writerows(
        sorted(rows, key=key_func)
    )

Output:

A1  26403   23273
B1  27792   8805
C1  5668    19510
...
F12 100 28583
G12 18707   14889
H12 13544   7447

The blocks are build based on the number part first, and within each block the lines run through the sorted characters.

This only works as long as the non-number part has always exactly one character.

If the non-number part has always exactly 2 characters then the return of the key function has to be adjusted to int(row[0][2:]), row[0][:2] etc.

If there's more variability allowed, e.g. between 1 and 5 characters, then a regex approach would be more appropriate:

import re

re_key = re.compile(r"([a-zA-Z]+)(\d+)")

def key_func(row):
    """ Transform row in sort key, e.g. ['Aa7', 10, 20] -> (7, 2, 'Aa') """
    word, number = re_key.match(row[0]).group(1, 2)
    return int(number), len(word), word

Here's a regex demo.

And, depending on how the words have to be sorted, it might be necessary to include the length of the word into the sort key: Python sorts ['B', 'AA', 'A'] naturally into ['A', 'AA', 'B'] and not ['A', 'B', 'AA']. The addition of the length, like in the function, does achieve that.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Rearranging cells in a .tsv file

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]