'Pandas rolling average of a columns of dates
I'm trying to calculate the rolling average of a column of datetime
objects. In my scenario, the input data are the last day below freezing each year for ~100 years.
Ignoring the years, I want to take a rolling average over the month-day rows. The output would be the average last day below freezing over some window of years.
My final plot will be years on the x-axis and the rolling average of the last day below freezing on the y-axis.
Here is a minimal setup of rows I want to average over.
df = pd.Series(['1900-05-04','1901-05-03', ..., '1910-04-30'])
df = pd.to_datetime(df)
>>> df
87 1900-05-04
88 1901-05-03
89 1902-05-06
90 1903-05-01
91 1904-05-03
92 1905-04-29
93 1906-05-03
94 1907-05-03
95 1908-05-04
96 1909-04-30
97 1910-04-30
dtype: datetime64[ns]
Desired output (with or without the years) with a window size of two would be:
1 1901-05-04
2 1902-05-05
3 1903-05-04
4 1904-05-02
5 1905-05-01
6 1906-05-03
7 1907-05-04
8 1908-05-02
9 1909-04-30
10 1910-04-30
Solution 1:[1]
Here is a hack! It should be taken as inspiration and mostly like there exist a better one as it’s not optimal
pd.to_datetime(pd.DataFrame((df.loc[1:].dt.year, df.loc[1:].dt.month,
df.groupby(df.dt.month).transform(lambda d: d.dt.day.rolling(2).mean().round(0)).loc[1:].astype(int))).T.rename(columns={0:'year',1:'month',2:'day'}))
I group by month and took a rolling mean. I rebuilt the date by assembling the year, month and rolled day.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Prayson W. Daniel |