'Pandas DataFrame.rolling window with MultiIndex
How can I apply a DataFrame.rolling function and assign the values back to the original DataFrame?
E.g.,
i = pd.MultiIndex.from_product([['A','B','C'], np.arange(1, 11, 1)], names=['Name','Num'])
df = pd.DataFrame(np.random.randn(30), i, columns=['Vals'])
window = df.groupby(['Name']).rolling(3, min_periods=3)
df['Window'] = window['Vals'].apply(lambda x: x[1] + x[2])
In this case, I want the rolling lambda to reset with each Name value, so Window would be NaN for the first 3 rows of each Name. However, the assignment to a new column of the DataFrame at the end produces a column that is entirely NaN
The sample window lambda essentially works correctly, but I am noticing that it adds the grouped value to the MultiIndex. Note in the following that there are two levels called Name:
window['Vals'].apply(lambda x: x[1] + x[2]) # Returns:
Name Name Num
A A 1 NaN
2 NaN
3 -2.408704
4 -3.184169
5 0.207093
6 1.649017
7 0.789064
8 0.706335
9 -0.487192
10 -1.625869
B B 1 NaN
2 NaN
3 0.023201
4 -0.044582
5 0.409526
Am I using these methods incorrectly, or failing to do something necessary to get the window.apply() to align with the DataFrame?
Solution 1:[1]
Here's an idea. Swap the index level. Then unstack. Compute the rolling window. Then stack.
Swap the index level:
df.set_index(df.index.swaplevel(),inplace=True)
Unstack
df=df.unstack(level=1)
Compute the rolling window sum and restack.
df.rolling(3,min_periods=3).sum().stack()
Then swap back the index
In one go that reads as:
# Your setup
import pandas as pd
i = pd.MultiIndex.from_product([['A','B','C'], np.arange(1, 11, 1)], names=['Name','Num'])
df = pd.DataFrame(np.random.randn(30), i, columns=['Vals'])
# Idea
def swap_index(df):
return(df.set_index(df.index.swaplevel()))
df2=swap_index(swap_index(df).unstack(level=1).rolling(3,min_periods=3).sum().stack())
The output is
In [1129]: df2
Out[1129]:
Vals
Name Num
A 3 0.713198
B 3 1.040715
C 3 0.607588
A 4 0.992321
B 4 0.318793
C 4 0.716161
A 5 1.422214
B 5 -0.870407
C 5 0.440496
A 6 -0.496093
B 6 0.947817
C 6 0.163391
A 7 0.587050
B 7 1.594572
C 7 0.022100
A 8 -0.283013
B 8 4.316982
C 8 -0.312434
A 9 1.695324
B 9 1.100592
C 9 -0.002113
A 10 0.651203
B 10 1.077666
C 10 -0.268794
If you want the NaNs for 1 and 2, then you need to create an empty data frame from your original index and merge with it. That's a bit of a pain still, but maybe this already does the trick for you.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
