'moving average within a resampled set

I have a dataframe, df, of a multiple-year hourly dataset and want to calculate 8-hour moving average within each day. Thank you to PatrickT for helpful discussion, I tried the following:

import pandas as pd
import random

numbers = [random.randint(10, 80) for i in range(48)]
datetimes = pd.date_range(start="2021-08-01 00:00:00", end="2021-08-2 23:00:00", freq='1h')
df = pd.DataFrame (data = numbers, index = datetimes, columns = ['var'])
df.resample('D')['var'].apply(lambda x: x.rolling(8).mean())

And it worked!

My initial problem was I used "df.resample('D')['var'].rolling(8).mean()", which gave me an error of " index 2 is out of bounds for axis 0 with size 2". Inspired by discussion with PatrickT combined with internet search, I used apply() and it worked! Though I still don't know why resample() cannot be followed by rolling().



Solution 1:[1]

I'm no expert and will defer to anyone with a better answer. It seems that .pad() helps:

import pandas as pd
import random

datetimes = pd.date_range(start="2019-08-01 00:00:00", end="2021-08-2 23:00:00", freq='1h')
numbers = [random.randint(10, 80) for i in range(len(datetimes))]
df = pd.DataFrame ({'data': numbers, 'index': datetimes})
df.head()

df.set_index('index').resample('D').pad().rolling(window=8).mean()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 PatrickT