'Pandas groupby().sum() is not ignoring None/empty/np.nan values
I am facing a weird problem with Pandas groupby().sum() together.
I have a DF with 2 category columns and 3 numeric/float value columns - value columns have Nones as shown below:
cat_1 cat_2 val_1 val_2 val_3
0 b z 0.1 NaN NaN
1 b x 0.1 NaN NaN
2 c y 0.1 1.0 NaN
3 c z 0.1 NaN NaN
4 c x 0.1 1.0 NaN
I want to group-by cat_1 and then sum the value columns: val_1, val_2 and val_3 per category.
This is what the final aggregation should look like:
cat_1 val_1 val_2 val_3
0 b 0.2 NaN NaN
1 c 0.3 2.0 NaN
Problem: is that when I do df.groupby(["cat_1"], as_index=False).sum(), I get 0.0 for categories where all values are None/null:
cat_1 val_1 val_2 val_3
0 b 0.2 0.0 0.0
1 c 0.3 2.0 0.0
How can I fix this issue? Thank you.
Solution 1:[1]
It seems there is no direct way of doing it like via DataFrame.sum(skipna=False) but it is not implemented for groupby-sum. However, as ScootCork mentioned, the required behaviour can be achieved using: groupby().sum(min_count=1) - from documentation.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | user3557405 |
