'Sum using groupby?

I want to add up polygamous women in my data by region. Here is my code :

import geopandas as gpd
import matplotlib as mpl
import matplotlib.pyplot as plt

database_path = "dataformap.csv"
database = pd.read_csv(database_path)
#Creating a geodataframe
points = gpd.points_from_xy(database["longitude"], database["latitude"], crs="EPSG:4326")
map = gpd.GeoDataFrame (database, geometry=points) 
database.set_index('geo_bj2017')
database['polygamy']=database.groupby('geo_bj2017')['polygam'].sum()

Here is what I obtain : enter image description here

Here is what database.head() gives

0   7.283433    2.12276 1   2017    alibori POINT (2.12276 7.28343) NaN
1   7.283433    2.12276 1   2017    alibori POINT (2.12276 7.28343) NaN
2   7.283433    2.12276 1   2017    alibori POINT (2.12276 7.28343) NaN
3   7.283433    2.12276 1   2017    alibori POINT (2.12276 7.28343) NaN
4   7.283433    2.12276 1   2017    alibori POINT (2.12276 7.28343) NaN

Is there a way to make this work on python ? Thanks in advance for your help



Solution 1:[1]

Without knowing the data, assuming the problem comes from NaNs in the data...

Instead of using a normal sum, you should use a "nan-safe" sum, like nansum from numpy like database['polygamy']=database.groupby('geo_bj2017')['polygam'].agg(np.nansum)

Note however, that for me groupby().sum() already operates with NA without issues:

>>> df=pd.DataFrame({'a':[0,1,0,1], 'b':[1, 2, 3, pd.NA]})
>>> df.groupby('a').agg(np.nansum)
   b
a
0  4
1  2

>>> df=pd.DataFrame({'a':[0,1,0,1], 'b':[1, 2, 3, np.nan] })
>>> df.groupby('a').sum()
     b
a
0  4.0
1  2.0

EDIT When groupingby/sum, you get an aggregation, not a "column" to paste. Example with partial data. I am missing the headers in your table.

x="""index x y polygam year geo_bj2017 
0   7.283433    2.12276 1   2017    alibori 
1   7.283433    2.12276 1   2017    alibori 
2   7.283433    2.12276 1   2017    alibori 
3   7.283433    2.12276 1   2017    alibori 
4   7.283433    2.12276 1   2017    alibori """

df = pd.read_csv(StringIO(x),sep='\s+')
df.groupby('geo_bj2017')['polygam'].sum()
>>> df.groupby('geo_bj2017')['polygam'].sum()
geo_bj2017
alibori    5
Name: polygam, dtype: int64

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1