'DataFrame groupby and rolling
I have the following test DataFrame to groupby id and calculate the price momentum over a rolling 2-day period:
df = pd.DataFrame()
df['date'] = ['202211', '202211', '202211', '202212', '202212', '202212', '202213', '202213', '202213']
df['id'] = ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']
df['price'] = [10, 10, 10, 15, 15, 15, 20, 20, 20]
df.set_index('date', inplace=True)
df.index = pd.to_datetime(df.index, format='%Y%m%d')
df.sort_values(['id', 'date'], inplace=True)
df
id price
date
2022-01-01 a 10
2022-01-02 a 15
2022-01-03 a 20
2022-01-01 b 10
2022-01-02 b 15
2022-01-03 b 20
2022-01-01 c 10
2022-01-02 c 15
2022-01-03 c 20
to get the price momentum of a 2 day rolling window per id, I found two solutions, which are 'momentum' and 'momentum2' in the following code. 'momentum' is what I use on my real dataset as it is a much faster computation and I am handling roughly 2 million rows in my df.
rolled = df.groupby('id').rolling(2)
df['momentum'] = rolled.sum().sort_values(['id', 'date'])['price'].values
df['momentum2'] = df.groupby('id')['price'].apply(lambda x: x.rolling(2).sum())
df
id price momentum momentum2
date
2022-01-01 a 10 NaN NaN
2022-01-02 a 15 25.0 25.0
2022-01-03 a 20 35.0 35.0
2022-01-01 b 10 NaN NaN
2022-01-02 b 15 25.0 25.0
2022-01-03 b 20 35.0 35.0
2022-01-01 c 10 NaN NaN
2022-01-02 c 15 25.0 25.0
2022-01-03 c 20 35.0 35.0
Here on the test df it works perfectly fine as expected. On my real dataset however (using the 'momentum' method) it seems to be buggy and for many id's I already have momentum values for the first observations, where NaN would be expected. It seems like the matching of rolled doesn't work correctly even though it has been sorted in the same manner as the df. What could be the reason? Are there any other options to do this efficiently?
Solution 1:[1]
df['momentum'] = df.groupby('id')['price'].transform(lambda y: y.rolling(2, min_periods=1).sum())
Output
id price momentum
date
2022-01-01 a 10 10
2022-01-02 a 15 25
2022-01-03 a 20 35
2022-01-01 b 10 10
2022-01-02 b 15 25
2022-01-03 b 20 35
2022-01-01 c 10 10
2022-01-02 c 15 25
2022-01-03 c 20 35
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
