'pandas reshape multiple columns fails with KeyError
For a pandas dataframe of:
defined by:
import pandas as pd
df = pd.DataFrame({'id':[1,2,3], 're_foo':[1,2,3], 're_bar':[4,5,6], 're_foo_baz':[0.4, 0.8, .9], 're_bar_baz':[.4,.5,.6], 'iteration':[1,2,3]})
display(df)
I want to reshape to the following format:
id, metric, value, iteration
1, foo , 1 , 1
1, bar , 4 , 1
1, foo_baz, 0.4 , 1
1, bar_baz, 0.4 , 0.4
...
A:
pd.wide_to_long(r, stubnames='re', i=['re_foo', 're_bar', 're_foo_baz', 're_bar_baz'], j='metric')
only results in a KeyError. How can I fix the reshape to work fine?
Solution 1:[1]
Here's a way using stack
:
# fix column name, remove re_
df.columns = df.columns.str.replace(r're_', '')
# reshape dataframe into required format
df = df.set_index(['id','iteration']).stack().reset_index().rename(columns={'level_2':'metric', 0: 'value'})
id iteration metric value
0 1 1 foo 1.0
1 1 1 bar 4.0
2 1 1 foo_baz 0.4
3 1 1 bar_baz 0.4
4 2 2 foo 2.0
5 2 2 bar 5.0
6 2 2 foo_baz 0.8
7 2 2 bar_baz 0.5
8 3 3 foo 3.0
9 3 3 bar 6.0
10 3 3 foo_baz 0.9
11 3 3 bar_baz 0.6
Solution 2:[2]
you can use pandas.melt and sort by id
df1 = df.melt(id_vars=['id', 'iteration'],
var_name="matric").sort_values(by=['id'])
df1.matric = df1.matric.str.replace('re_','')
>>>
id iteration matric value
0 1 1 foo 1.0
3 1 1 bar 4.0
6 1 1 foo_baz 0.4
9 1 1 bar_baz 0.4
1 2 2 foo 2.0
4 2 2 bar 5.0
7 2 2 foo_baz 0.8
10 2 2 bar_baz 0.5
2 3 3 foo 3.0
5 3 3 bar 6.0
8 3 3 foo_baz 0.9
11 3 3 bar_baz 0.6
edit: renaming the heading first is the right way
Solution 3:[3]
One option is to rename the columns before applying wide_to_long
:
df.columns = df.columns.str.replace('re','value')
(pd
.wide_to_long(
df,
stubnames = 'value',
sep='_',
suffix = '.+',
i = ['id', 'iteration'],
j = 'metric')
.reset_index()
)
id iteration metric value
0 1 1 foo 1.0
1 1 1 bar 4.0
2 1 1 foo_baz 0.4
3 1 1 bar_baz 0.4
4 2 2 foo 2.0
5 2 2 bar 5.0
6 2 2 foo_baz 0.8
7 2 2 bar_baz 0.5
8 3 3 foo 3.0
9 3 3 bar 6.0
10 3 3 foo_baz 0.9
11 3 3 bar_baz 0.6
The renaming could also be done after the transformation:
df = pd.DataFrame({'id':[1,2,3],
're_foo':[1,2,3],
're_bar':[4,5,6],
're_foo_baz':[0.4, 0.8, .9],
're_bar_baz':[.4,.5,.6],
'iteration':[1,2,3]})
(pd
.wide_to_long(
df,
stubnames = 're',
sep='_',
suffix = '.+',
i = ['id', 'iteration'],
j = 'metric')
.squeeze()
.rename('value')
.reset_index()
)
id iteration metric value
0 1 1 foo 1.0
1 1 1 bar 4.0
2 1 1 foo_baz 0.4
3 1 1 bar_baz 0.4
4 2 2 foo 2.0
5 2 2 bar 5.0
6 2 2 foo_baz 0.8
7 2 2 bar_baz 0.5
8 3 3 foo 3.0
9 3 3 bar 6.0
10 3 3 foo_baz 0.9
11 3 3 bar_baz 0.6
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | YOLO |
Solution 2 | |
Solution 3 | sammywemmy |