'Rolling Regression Residuals Python
I hope you can help me with my problem. I want to do a rolling regression on a dataframe in Python and calculate the standard deviation on only a part of the residuals.
For example: in the table below I want to estimate parameters based on a moving window (e.g. Y = [5,7,9,10] on X_1 = [1,2,4,5] and X_2 =[2,3,4,4] which results in intercept = 2.4 and B_1 = 0.7 and B_2 = 1. These estimators lead to residuals = [4.8,0.5,-0.2,-0.2] of which the standard deviation is measured based on the last 3 residuals [0.5,-0.2,-0.2], which should be passet to the column ["standard deviation"]
| Index | Y | X_1 | X_2 | Standard deviation |
|---|---|---|---|---|
| 0 | 5 | 1 | 2 | 0.404145188 |
| 1 | 7 | 2 | 3 | 2.081665999 |
| 2 | 9 | 4 | 4 | 2.511132239 |
| 3 | 10 | 5 | 4 | 0.864408264 |
| 4 | 11 | 6 | 2 | nan |
| 5 | 14 | 5 | 5 | nan |
| 6 | 17 | 7 | 6 | nan |
My original dataset is huge, so I tried to avoid a for loop. My approach so far is to either do a regression in each row, using the following function (which does not result in the :
import statsmodels.api as sm
df["Standard deviation"] = df.rolling(window = 4).apply(lambda x: (df["Y"]-sm.OLS(df["Y"],df["X_1"]&df["X_2").fit().predict()).std())
However, the function only works on the entire column - so it is not a rolling regression and I could not find a way to only calculate the standard deviation based on the last 3 residuals.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
