'pandas reshape multiple columns fails with KeyError

For a pandas dataframe of:

enter image description here

defined by:

import pandas as pd
df = pd.DataFrame({'id':[1,2,3], 're_foo':[1,2,3], 're_bar':[4,5,6], 're_foo_baz':[0.4, 0.8, .9], 're_bar_baz':[.4,.5,.6], 'iteration':[1,2,3]})
display(df)

I want to reshape to the following format:

id, metric, value, iteration
1, foo    , 1    , 1
1, bar    , 4    , 1
1, foo_baz, 0.4  , 1
1, bar_baz, 0.4  , 0.4
...

A:

pd.wide_to_long(r, stubnames='re', i=['re_foo', 're_bar', 're_foo_baz', 're_bar_baz'], j='metric')

only results in a KeyError. How can I fix the reshape to work fine?



Solution 1:[1]

Here's a way using stack:

# fix column name, remove re_
df.columns = df.columns.str.replace(r're_', '')

# reshape dataframe into required format
df = df.set_index(['id','iteration']).stack().reset_index().rename(columns={'level_2':'metric', 0: 'value'})

    id  iteration   metric  value
0    1          1      foo    1.0
1    1          1      bar    4.0
2    1          1  foo_baz    0.4
3    1          1  bar_baz    0.4
4    2          2      foo    2.0
5    2          2      bar    5.0
6    2          2  foo_baz    0.8
7    2          2  bar_baz    0.5
8    3          3      foo    3.0
9    3          3      bar    6.0
10   3          3  foo_baz    0.9
11   3          3  bar_baz    0.6

Solution 2:[2]

you can use pandas.melt and sort by id

df1 = df.melt(id_vars=['id', 'iteration'], 
              var_name="matric").sort_values(by=['id'])
df1.matric = df1.matric.str.replace('re_','')

>>>
    id  iteration   matric  value
0    1          1      foo    1.0
3    1          1      bar    4.0
6    1          1  foo_baz    0.4
9    1          1  bar_baz    0.4
1    2          2      foo    2.0
4    2          2      bar    5.0
7    2          2  foo_baz    0.8
10   2          2  bar_baz    0.5
2    3          3      foo    3.0
5    3          3      bar    6.0
8    3          3  foo_baz    0.9
11   3          3  bar_baz    0.6

edit: renaming the heading first is the right way

Solution 3:[3]

One option is to rename the columns before applying wide_to_long:

df.columns = df.columns.str.replace('re','value')

(pd
.wide_to_long(
    df, 
    stubnames = 'value', 
    sep='_', 
    suffix = '.+', 
    i = ['id', 'iteration'], 
    j = 'metric')
.reset_index()
)
    id  iteration   metric  value
0    1          1      foo    1.0
1    1          1      bar    4.0
2    1          1  foo_baz    0.4
3    1          1  bar_baz    0.4
4    2          2      foo    2.0
5    2          2      bar    5.0
6    2          2  foo_baz    0.8
7    2          2  bar_baz    0.5
8    3          3      foo    3.0
9    3          3      bar    6.0
10   3          3  foo_baz    0.9
11   3          3  bar_baz    0.6

The renaming could also be done after the transformation:

df = pd.DataFrame({'id':[1,2,3], 
                   're_foo':[1,2,3], 
                   're_bar':[4,5,6], 
                   're_foo_baz':[0.4, 0.8, .9], 
                   're_bar_baz':[.4,.5,.6], 
                   'iteration':[1,2,3]})

(pd
.wide_to_long(
    df, 
    stubnames = 're', 
    sep='_', 
    suffix = '.+', 
    i = ['id', 'iteration'], 
    j = 'metric')
.squeeze()
.rename('value')
.reset_index()
)
    id  iteration   metric  value
0    1          1      foo    1.0
1    1          1      bar    4.0
2    1          1  foo_baz    0.4
3    1          1  bar_baz    0.4
4    2          2      foo    2.0
5    2          2      bar    5.0
6    2          2  foo_baz    0.8
7    2          2  bar_baz    0.5
8    3          3      foo    3.0
9    3          3      bar    6.0
10   3          3  foo_baz    0.9
11   3          3  bar_baz    0.6

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 YOLO
Solution 2
Solution 3 sammywemmy