'Is dropna=True in pandas groupby useful?

I am not certain if this question is appropriate here, and apologies in advance if it is not.

I am a pandas maintainer, and recently I've been working on fixing bugs in pandas groupby when used with dropna=True and transform for the 1.5 release. For example, in pandas 1.4.2,

import pandas as pd

df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [2, 3, 4]})
print(df.groupby('a', dropna=True).transform('sum'))

produces the incorrect (in particular, the last row) output

While working on this, I've been wondering how useful the dropna argument is in groupby. For aggregations (e.g. df.groupby('a').sum()) and filters (e.g. df.groupby('a').head(2)), it seems to me it's always possible to drop the offending rows prior to the groupby. In addition to this, in my use of pandas if I have null values in the groupers, then I want them in the groupby result. For transformations, where the resulting index should match that of the input, the value is instead filled with null. For the above code block, the output should be

     b
0  5.0
1  5.0
2  NaN

But I can't imagine this result ever being useful. In case it is, it also is not too difficult to accomplish:

result = df.groupby('a', dropna=False).transform('sum')
result.loc[df['a'].isnull()] = np.nan

If we were able to deprecate and then remove the dropna argument to groupby (i.e. groupby always behaves as if dropna=False), then this would help simplify a good part of the groupby code.

So I'd like to ask if there are examples where dropna=True and the operation might be otherwise hard to accomplish.

Thanks!

pandas pandas-groupby

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Is dropna=True in pandas groupby useful?

Sources

Related Questions