'Pandas DataFrame.rolling window with MultiIndex

How can I apply a DataFrame.rolling function and assign the values back to the original DataFrame?

E.g.,

i = pd.MultiIndex.from_product([['A','B','C'], np.arange(1, 11, 1)], names=['Name','Num'])
df = pd.DataFrame(np.random.randn(30), i, columns=['Vals'])
window = df.groupby(['Name']).rolling(3, min_periods=3)
df['Window'] = window['Vals'].apply(lambda x: x[1] + x[2])

In this case, I want the rolling lambda to reset with each Name value, so Window would be NaN for the first 3 rows of each Name. However, the assignment to a new column of the DataFrame at the end produces a column that is entirely NaN

The sample window lambda essentially works correctly, but I am noticing that it adds the grouped value to the MultiIndex. Note in the following that there are two levels called Name:

window['Vals'].apply(lambda x: x[1] + x[2]) # Returns:
Name  Name  Num
A     A     1           NaN
            2           NaN
            3     -2.408704
            4     -3.184169
            5      0.207093
            6      1.649017
            7      0.789064
            8      0.706335
            9     -0.487192
            10    -1.625869
B     B     1           NaN
            2           NaN
            3      0.023201
            4     -0.044582
            5      0.409526

Am I using these methods incorrectly, or failing to do something necessary to get the window.apply() to align with the DataFrame?



Solution 1:[1]

Here's an idea. Swap the index level. Then unstack. Compute the rolling window. Then stack.

Swap the index level:

df.set_index(df.index.swaplevel(),inplace=True)

Unstack

df=df.unstack(level=1)

Compute the rolling window sum and restack.

df.rolling(3,min_periods=3).sum().stack()

Then swap back the index

In one go that reads as:

# Your setup
import pandas as pd
i = pd.MultiIndex.from_product([['A','B','C'], np.arange(1, 11, 1)], names=['Name','Num'])
df = pd.DataFrame(np.random.randn(30), i, columns=['Vals'])

# Idea
def swap_index(df):
    return(df.set_index(df.index.swaplevel()))
df2=swap_index(swap_index(df).unstack(level=1).rolling(3,min_periods=3).sum().stack())

The output is

In [1129]: df2
Out[1129]:
              Vals
Name Num
A    3    0.713198
B    3    1.040715
C    3    0.607588
A    4    0.992321
B    4    0.318793
C    4    0.716161
A    5    1.422214
B    5   -0.870407
C    5    0.440496
A    6   -0.496093
B    6    0.947817
C    6    0.163391
A    7    0.587050
B    7    1.594572
C    7    0.022100
A    8   -0.283013
B    8    4.316982
C    8   -0.312434
A    9    1.695324
B    9    1.100592
C    9   -0.002113
A    10   0.651203
B    10   1.077666
C    10  -0.268794

If you want the NaNs for 1 and 2, then you need to create an empty data frame from your original index and merge with it. That's a bit of a pain still, but maybe this already does the trick for you.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1