'Applying a function on pandas groupby object return variable type
When applying a user-defined function f that takes a pd.DataFrame as input and returns a pd.Series on a pd.DataFrame.groupby object, the type of the returned object seems to depend on the number of unique values present in the field used to perform the grouping operation.
I am trying to understand why the api is behaving this way, and for a neat way to have a pd.Series returned regardless of the number of unique values in the grouping field.
I went through the split-apply-combine section of pandas, and it seems like the single-valued dataframe is treated as a pd.Series which does not make sense to me.
import pandas as pd
from typing import Union
def f(df : pd.DataFrame) -> pd.Series:
"""
User-defined function
"""
return df['B'] / df['B'].max()
# Should only output a pd.Series
def perform_apply(df : pd.DataFrame) -> Union[pd.Series,pd.DataFrame] :
return df.groupby('A').apply(f)
# Some dummy dataframe with multiple values in field 'A'
df1 = pd.DataFrame({'A': 'a a b'.split(),
'B': [1,2,3],
'C': [4,6,5]})
# Subset of dataframe wiht a single value in field 'A'
df2 = df1[df1['A'] == 'a'].copy()
res1 = perform_apply(df1)
res2 = perform_apply(df2)
print(type(res1),type(res2))
# --------------------------------
# -> <class 'pandas.core.series.Series'> <class 'pandas.core.frame.DataFrame'>
pandas : 1.4.2 python : 3.9.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
