'Convert combinations of row+column as Column headers
I have a dataframe as follows:
Machine Time Part PowerA PowerB
1 20:30 1 0.1 0.4
1 20:30 2 0.9 0.7
1 20:31 1 0.3 0.1
1 20:31 2 0.2 0.3
2 20:30 1 0.2 0.5
2 20:31 1 0.8 0.4
I want it to be like:
Machine Time Part1_PowerA Part1_PowerB Part2_PowerA Part2_PowerB
1 20:30 0.1 0.4 0.9 0.7
1 20:31 0.3 0.1 0.2 0.3
2 20:30 0.2 0.5 -1.0 -1.0
2 20:31 0.8 0.4 -1.0 -1.0
The objective is that I create a column for each Part and Power and fill the values as shown. Each machine has a variable number of parts, but the maximum is 8 (which would lead to columns Part8_PowerA and Part8_PowerB). When a machine doesn't have a certain part, the values for the Part_Power are filled with -1.
I have looked quite a while for solutions, including this one, but I wasn't able to adapt to my situation, where I actually change the names of the columns as a combination of a row value + already existent column.
Thanks!
Solution 1:[1]
You can use pivot in combination with string formatting for a fairly clean solution:
out = (
df.pivot(['Machine', 'Time'], 'Part')
.fillna(-1)
.sort_index(level=1, axis=1)
)
out.columns = out.columns.map('Part{0[1]}_{0[0]}'.format)
out = out.reset_index()
print(out)
Machine Time Part1_PowerA Part1_PowerB Part2_PowerA Part2_PowerB
0 1 20:30 0.1 0.4 0.9 0.7
1 1 20:31 0.3 0.1 0.2 0.3
2 2 20:30 0.2 0.5 -1.0 -1.0
3 2 20:31 0.8 0.4 -1.0 -1.0
Alternatively, if you want a method chained solution, you'll need to use .pipe to access the current columns values from .set_axis so you end up with some extra code bloat:
print(
df.pivot(['Machine', 'Time'], 'Part')
.fillna(-1)
.sort_index(level=1, axis=1)
.pipe(lambda d:
d.set_axis(
d.columns.map('Part{0[1]}_{0[0]}'.format),
axis='columns'
)
)
.reset_index()
)
Machine Time Part1_PowerA Part1_PowerB Part2_PowerA Part2_PowerB
0 1 20:30 0.1 0.4 0.9 0.7
1 1 20:31 0.3 0.1 0.2 0.3
2 2 20:30 0.2 0.5 -1.0 -1.0
3 2 20:31 0.8 0.4 -1.0 -1.0
Solution 2:[2]
You may do:
df = pd.pivot_table(df, values=['PowerA','PowerB'], index=['Machine', 'Time'], columns='Part')
df.columns = df.columns.to_flat_index()
df.columns = [ 'Part' + str(col[1]) + '_' + col[0] for col in df.columns]
df.reset_index(inplace=True)
df.fillna(-1,inplace=True)
Solution 3:[3]
One option is with pivot_wider from pyjanitor, that can help with abstracting the reshaping process:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_wider(
index = ['Machine', 'Time'],
names_from = 'Part',
names_glue = "Part{Part}_{_value}")
.fillna(-1)
)
Machine Time Part1_PowerA Part2_PowerA Part1_PowerB Part2_PowerB
0 1 20:30 0.1 0.9 0.4 0.7
1 1 20:31 0.3 0.2 0.1 0.3
2 2 20:30 0.2 -1.0 0.5 -1.0
3 2 20:31 0.8 -1.0 0.4 -1.0
The names_glue offers a flexible way to reshape the column names. It uses python's string format_map under the hood. The _value is a placeholder for the values columns (in this case the Power columns). Note that the relevant labels from the dataframe are wrapped in a {} - that is _value and Part.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Cameron Riddell |
| Solution 2 | SomeDude |
| Solution 3 | sammywemmy |
