'Move values from column to column names in a pandas DataFrame
I have a dataframe like the one that follows:
import pandas as pd
import numpy as np
filename = [
"f1",
"f1",
"f1",
"f1",
"f2",
"f2",
"f2",
"f2",
"f2",
"f2"
]
parts = ["p1", "p1", "p2", "p2", "p1", "p1", "p2", "p2", "p3", "p3"]
groups = ["g1", "g2"] * 5
values = [1, 2, 3, 4, 10, 20, 30, 40, 50, 60]
values2 = [i + 1 for i in values]
df = pd.DataFrame.from_records({"filename": filename, "part":parts, "f1": values, "group": groups, "f2":values2})
df = df.set_index(["filename", "part"])
print(df)
f1 f2 group
filename part
f1 p1 1 2 g1
p1 2 3 g2
p2 3 4 g1
p2 4 5 g2
f2 p1 10 11 g1
p1 20 21 g2
p2 30 31 g1
p2 40 41 g2
p3 50 51 g1
p3 60 61 g2
Each file is subdivided into more parts, in this case two parts for f1 and three for f2. For each part, values f1 and f2 are extracted for a given number of features group.
I would like to obtain a new dataframe with:
filename, part, f1_g1, f2_g2
i.e. I would like to remove the column group and move its values (g1 and g2) in the column names.
The solution I have implemented is:
import pandas as pd
import numpy as np
filename = [
"f1",
"f1",
"f1",
"f1",
"f2",
"f2",
"f2",
"f2",
"f2",
"f2"
]
parts = ["p1", "p1", "p2", "p2", "p1", "p1", "p2", "p2", "p3", "p3"]
groups = ["g1", "g2"] * 5
values = [1, 2, 3, 4, 10, 20, 30, 40, 50, 60]
values2 = [i + 1 for i in values]
df = pd.DataFrame.from_records({"filename": filename, "part":parts, "f1": values, "group": groups, "f2":values2})
df = df.set_index(["filename", "part"])
print(df)
df = df.reset_index()
groups = df.group.unique()
cols = [c for c in df.columns if (c not in ['filename', 'part', 'group'])]
records = []
for fn in df.filename.unique():
parts = df.loc[df.filename == fn, "part"].unique()
for p in parts:
d = dict()
for g in groups:
d['filename'] = fn
d['part'] = p
for c in cols:
value = df.loc[(df.filename == fn) & (df.part == p) & (df.group==g), c].iloc[0]
d[f"{c}_{g}"] = value
records.append(d)
df2 = pd.DataFrame.from_records(records).set_index(["filename", "part"])
print(df2)
that builds a DataFrame df2:
f1_g1 f2_g1 f1_g2 f2_g2
filename part
f1 p1 1 2 2 3
p2 3 4 4 5
f2 p1 10 11 20 21
p2 30 31 40 41
p3 50 51 60 61
My actual data has plenty more columns. This solution works, but I do not think it is elegant and the best one might think of.
Is there a better way to do it?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
