'Passing a Panda's rolling aggregation method as a function argumnet
I'd like to wrap a group-by-rolling-aggregation sequence within a function, in a manner that would pass the aggregation method itself, like mean or std, as a function argument, like in the code below:
df = pd.DataFrame({'date': ['2020-01-13', '2020-09-19', '2021-05-10', '2022-02-01'],
'provider': ['A', 'B', 'A', 'B'],
'points': [10, 2, 1, 8]})
def provider_rolling_window(df,ind,window_size,agg_method):
s = df.sort_values(by=['provider','date'], ascending=True)\
.groupby(['provider'])[ind]\
.rolling(window_size, min_periods = 1)\
.agg_method\
.reset_index(drop=True,level=0)
return(s)
df['moving_avg_3'] = provider_rolling_window(df,'points',3,mean)
However the interpreter doesn't really like this and complains :
---> 14 df['moving_avg_3'] = provider_rolling_window(df,'points',3,mean)
NameError: name 'mean' is not defined
Even if I try:
f = pd.groupby.rolling.mean
df['moving_avg_3'] = provider_rolling_window(df,'points',3,f)
It still complains:
AttributeError: module 'pandas' has no attribute 'groupby'
Is there a proper way to go about this?
This means: is there a way to pass an argument that would result with equivalent functionality as hard coding the method (e.g.: mean() / std()) in the function?
Solution 1:[1]
Firs of all you need to pass the existing function, like np.mean. Function mean is not defined in the python itself.
The way to do this is to use the function apply. So your function would look like this:
def provider_rolling_window(df, ind, window_size, agg_method):
s = df.sort_values(by=['provider','date'], ascending=True)\
.groupby(['provider'])[ind]\
.rolling(window_size, min_periods = 1)\
.apply(agg_method)\
.reset_index(drop=True,level=0)
return(s)
df['moving_avg_3'] = provider_rolling_window(df, 'points', 3, np.mean)
print(df)
Solution 2:[2]
Instead, pass your agg_method as a string and call agg with it::
def provider_rolling_window(df,ind,window_size,agg_method):
s = df.sort_values(by=['provider','date'], ascending=True)\
.groupby(['provider'])[ind]\
.rolling(window_size, min_periods = 1)\
.agg(agg_method)\
.reset_index(drop=True,level=0)
return(s)
df['moving_avg_3'] = provider_rolling_window(df,'points',3,'mean')
Output:
>>> df
date provider points moving_avg_3
0 2020-01-13 A 10 10.0
1 2020-09-19 B 2 2.0
2 2021-05-10 A 1 5.5
3 2022-02-01 B 8 5.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mackostya |
| Solution 2 | richardec |
