'moving average within a resampled set
I have a dataframe, df, of a multiple-year hourly dataset and want to calculate 8-hour moving average within each day. Thank you to PatrickT for helpful discussion, I tried the following:
import pandas as pd
import random
numbers = [random.randint(10, 80) for i in range(48)]
datetimes = pd.date_range(start="2021-08-01 00:00:00", end="2021-08-2 23:00:00", freq='1h')
df = pd.DataFrame (data = numbers, index = datetimes, columns = ['var'])
df.resample('D')['var'].apply(lambda x: x.rolling(8).mean())
And it worked!
My initial problem was I used "df.resample('D')['var'].rolling(8).mean()", which gave me an error of " index 2 is out of bounds for axis 0 with size 2". Inspired by discussion with PatrickT combined with internet search, I used apply() and it worked! Though I still don't know why resample() cannot be followed by rolling().
Solution 1:[1]
I'm no expert and will defer to anyone with a better answer. It seems that .pad() helps:
import pandas as pd
import random
datetimes = pd.date_range(start="2019-08-01 00:00:00", end="2021-08-2 23:00:00", freq='1h')
numbers = [random.randint(10, 80) for i in range(len(datetimes))]
df = pd.DataFrame ({'data': numbers, 'index': datetimes})
df.head()
df.set_index('index').resample('D').pad().rolling(window=8).mean()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | PatrickT |
