'Weighted average with min_count pandas aggregate
I need your help trying to make a weighted average function (already done) but I need to add a new feature where I can choose the min_count neccesary to do the weighted average, otherwise, it should return NaN.
Here it is what I have so far...
def weighted_average(df, float_cols, weight_col, group_every, offset=None):
df1 = df.copy(deep=True)
df1 = df1.apply(pd.to_numeric, errors='coerce')
float_cols.remove(weight_col) if weight_col in float_cols else None
df2 = pd.DataFrame(data=[], columns=float_cols),
nan = np.nan
for i in float_cols:
dfaux = df1.copy()
dfaux['mass_wt'] = np.where(dfaux[i].notnull(), dfaux[weight_col] * dfaux[i], np.nan)
op = dfaux[[i, 'mass_wt', weight_col]]. \
query(f'({i} == {i})'). \
groupby(pd.Grouper(freq=group_every, offset=offset)). \
agg(weightcol_sum=(weight_col, 'sum'), weightcol_mean=(weight_col, 'mean'),
masswt_sum=('mass_wt', 'sum'), masswt_count=('mass_wt', 'count'))
op['op'] = op['masswt_sum'] / op['weightcol_sum']
df2[i] = op['op']
df2[f'{weight_col}_mean'] = op['weightcol_mean'].copy(deep=True)
df2[f'{weight_col}_sum'] = op['weightcol_sum'].copy(deep=True)
df2[f'{weight_col}_count'] = op['masswt_count'].copy(deep=True)
print(i)
return df2
In the 'agg' part (or pd.Grouper), I need to set up a min_count so I can choose for example, that if I am to group by day... I need at least 20 rows of data of the 'weight_col' in order to return a result... otherwise it should return NaN.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
