'DataFrame groupby and rolling

I have the following test DataFrame to groupby id and calculate the price momentum over a rolling 2-day period:

df = pd.DataFrame()
df['date'] = ['202211', '202211', '202211', '202212', '202212', '202212', '202213', '202213', '202213']
df['id'] = ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']
df['price'] = [10, 10, 10, 15, 15, 15, 20, 20, 20]
df.set_index('date', inplace=True)
df.index = pd.to_datetime(df.index, format='%Y%m%d')
df.sort_values(['id', 'date'], inplace=True)
df
            id  price
date        
2022-01-01  a   10
2022-01-02  a   15
2022-01-03  a   20
2022-01-01  b   10
2022-01-02  b   15
2022-01-03  b   20
2022-01-01  c   10
2022-01-02  c   15
2022-01-03  c   20

to get the price momentum of a 2 day rolling window per id, I found two solutions, which are 'momentum' and 'momentum2' in the following code. 'momentum' is what I use on my real dataset as it is a much faster computation and I am handling roughly 2 million rows in my df.

rolled = df.groupby('id').rolling(2)
df['momentum'] = rolled.sum().sort_values(['id', 'date'])['price'].values
df['momentum2'] = df.groupby('id')['price'].apply(lambda x: x.rolling(2).sum())
df
           id  price  momentum  momentum2
date                                     
2022-01-01  a     10       NaN        NaN
2022-01-02  a     15      25.0       25.0
2022-01-03  a     20      35.0       35.0
2022-01-01  b     10       NaN        NaN
2022-01-02  b     15      25.0       25.0
2022-01-03  b     20      35.0       35.0
2022-01-01  c     10       NaN        NaN
2022-01-02  c     15      25.0       25.0
2022-01-03  c     20      35.0       35.0

Here on the test df it works perfectly fine as expected. On my real dataset however (using the 'momentum' method) it seems to be buggy and for many id's I already have momentum values for the first observations, where NaN would be expected. It seems like the matching of rolled doesn't work correctly even though it has been sorted in the same manner as the df. What could be the reason? Are there any other options to do this efficiently?



Solution 1:[1]

df['momentum'] = df.groupby('id')['price'].transform(lambda y: y.rolling(2, min_periods=1).sum())

Output

           id  price  momentum
date                          
2022-01-01  a     10        10
2022-01-02  a     15        25
2022-01-03  a     20        35
2022-01-01  b     10        10
2022-01-02  b     15        25
2022-01-03  b     20        35
2022-01-01  c     10        10
2022-01-02  c     15        25
2022-01-03  c     20        35
         

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1