'Move values from column to column names in a pandas DataFrame

I have a dataframe like the one that follows:

import pandas as pd
import numpy as np

filename = [
    "f1",
    "f1",
    "f1",
    "f1",
    "f2",
    "f2",
    "f2",
    "f2",
    "f2",
    "f2"
]
parts = ["p1", "p1", "p2", "p2", "p1", "p1", "p2", "p2", "p3", "p3"]
groups = ["g1", "g2"] * 5
values = [1, 2, 3, 4, 10, 20, 30, 40, 50, 60]
values2 = [i + 1 for i in values]
df = pd.DataFrame.from_records({"filename": filename, "part":parts, "f1": values, "group": groups, "f2":values2})
df = df.set_index(["filename", "part"])
print(df)
               f1  f2 group
filename part              
f1       p1     1   2    g1
         p1     2   3    g2
         p2     3   4    g1
         p2     4   5    g2
f2       p1    10  11    g1
         p1    20  21    g2
         p2    30  31    g1
         p2    40  41    g2
         p3    50  51    g1
         p3    60  61    g2

Each file is subdivided into more parts, in this case two parts for f1 and three for f2. For each part, values f1 and f2 are extracted for a given number of features group.

I would like to obtain a new dataframe with: filename, part, f1_g1, f2_g2 i.e. I would like to remove the column group and move its values (g1 and g2) in the column names.

The solution I have implemented is:

import pandas as pd
import numpy as np

filename = [
    "f1",
    "f1",
    "f1",
    "f1",
    "f2",
    "f2",
    "f2",
    "f2",
    "f2",
    "f2"
]
parts = ["p1", "p1", "p2", "p2", "p1", "p1", "p2", "p2", "p3", "p3"]
groups = ["g1", "g2"] * 5
values = [1, 2, 3, 4, 10, 20, 30, 40, 50, 60]
values2 = [i + 1 for i in values]
df = pd.DataFrame.from_records({"filename": filename, "part":parts, "f1": values, "group": groups, "f2":values2})
df = df.set_index(["filename", "part"])
print(df)

df = df.reset_index()
groups = df.group.unique()
cols = [c for c in df.columns if (c not in ['filename', 'part', 'group'])]
records = []

for fn in df.filename.unique():
    parts = df.loc[df.filename == fn, "part"].unique()
    for p in parts:
        d = dict()
        for g in groups:
            d['filename'] = fn
            d['part'] = p
            for c in cols:
                value = df.loc[(df.filename == fn) & (df.part == p) & (df.group==g), c].iloc[0]
                d[f"{c}_{g}"] = value
        records.append(d)
df2 = pd.DataFrame.from_records(records).set_index(["filename", "part"])
print(df2)

that builds a DataFrame df2:

               f1_g1  f2_g1  f1_g2  f2_g2
filename part                            
f1       p1        1      2      2      3
         p2        3      4      4      5
f2       p1       10     11     20     21
         p2       30     31     40     41
         p3       50     51     60     61

My actual data has plenty more columns. This solution works, but I do not think it is elegant and the best one might think of.

Is there a better way to do it?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source