'Calculate new MultiIndex level from existing MultiIndex level values

For a DataFrame with two MultiIndex levels age and yearref, the goal is to add a new MultiIndex level yearconstr calculated as yearconstr = yearref - age.

import pandas as pd

df = pd.DataFrame({"value": [1, 2, 3]}, 
                  index=pd.MultiIndex.from_tuples([(10, 2015), (3, 2015), (2, 2016)], 
                                                  names=["age", "yearref"]))
print(df)

# input df:
             value
age yearref       
10  2015         1
3   2015         2
2   2016         3

We could reset the index, calculate a new column and then put the original index back in place plus the newly defined column, but surely there must be a better way.

df = (df.reset_index()
        .assign(yearconstr=lambda df: df.yearref - df.age)
        .set_index(list(df.index.names) + ["yearconstr"]))

print(df)

# expected result:
                        value
age yearref yearconstr       
10  2015    2005            1
3   2015    2012            2
2   2016    2014            3


Solution 1:[1]

For a concise and straight-forward approach, we can make use of

  • eval to generate a new Series calculated from the existing MultiIndex. This is easy since it treats index levels just like columns: df.eval("yearref - age")
  • rename the new Series
  • set_index to append the Series to df using the append=True argument.

Putting everything together:

df.set_index(df.eval("yearref - age").rename("yearconstr"), append=True)

# result:
                        value
age yearref yearconstr       
10  2015    2005            1
3   2015    2012            2
2   2016    2014            3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mcsoini