'Calculate new MultiIndex level from existing MultiIndex level values
For a DataFrame with two MultiIndex levels age and yearref, the goal is to add a new MultiIndex level yearconstr calculated as yearconstr = yearref - age.
import pandas as pd
df = pd.DataFrame({"value": [1, 2, 3]},
index=pd.MultiIndex.from_tuples([(10, 2015), (3, 2015), (2, 2016)],
names=["age", "yearref"]))
print(df)
# input df:
value
age yearref
10 2015 1
3 2015 2
2 2016 3
We could reset the index, calculate a new column and then put the original index back in place plus the newly defined column, but surely there must be a better way.
df = (df.reset_index()
.assign(yearconstr=lambda df: df.yearref - df.age)
.set_index(list(df.index.names) + ["yearconstr"]))
print(df)
# expected result:
value
age yearref yearconstr
10 2015 2005 1
3 2015 2012 2
2 2016 2014 3
Solution 1:[1]
For a concise and straight-forward approach, we can make use of
evalto generate a new Series calculated from the existing MultiIndex. This is easy since it treats index levels just like columns:df.eval("yearref - age")renamethe new Seriesset_indexto append the Series todfusing theappend=Trueargument.
Putting everything together:
df.set_index(df.eval("yearref - age").rename("yearconstr"), append=True)
# result:
value
age yearref yearconstr
10 2015 2005 1
3 2015 2012 2
2 2016 2014 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mcsoini |
