'Sum using groupby?
I want to add up polygamous women in my data by region. Here is my code :
import geopandas as gpd
import matplotlib as mpl
import matplotlib.pyplot as plt
database_path = "dataformap.csv"
database = pd.read_csv(database_path)
#Creating a geodataframe
points = gpd.points_from_xy(database["longitude"], database["latitude"], crs="EPSG:4326")
map = gpd.GeoDataFrame (database, geometry=points)
database.set_index('geo_bj2017')
database['polygamy']=database.groupby('geo_bj2017')['polygam'].sum()
Here is what I obtain : enter image description here
Here is what database.head() gives
0 7.283433 2.12276 1 2017 alibori POINT (2.12276 7.28343) NaN
1 7.283433 2.12276 1 2017 alibori POINT (2.12276 7.28343) NaN
2 7.283433 2.12276 1 2017 alibori POINT (2.12276 7.28343) NaN
3 7.283433 2.12276 1 2017 alibori POINT (2.12276 7.28343) NaN
4 7.283433 2.12276 1 2017 alibori POINT (2.12276 7.28343) NaN
Is there a way to make this work on python ? Thanks in advance for your help
Solution 1:[1]
Without knowing the data, assuming the problem comes from NaNs in the data...
Instead of using a normal sum, you should use a "nan-safe" sum, like nansum from numpy like
database['polygamy']=database.groupby('geo_bj2017')['polygam'].agg(np.nansum)
Note however, that for me groupby().sum() already operates with NA without issues:
>>> df=pd.DataFrame({'a':[0,1,0,1], 'b':[1, 2, 3, pd.NA]})
>>> df.groupby('a').agg(np.nansum)
b
a
0 4
1 2
>>> df=pd.DataFrame({'a':[0,1,0,1], 'b':[1, 2, 3, np.nan] })
>>> df.groupby('a').sum()
b
a
0 4.0
1 2.0
EDIT When groupingby/sum, you get an aggregation, not a "column" to paste. Example with partial data. I am missing the headers in your table.
x="""index x y polygam year geo_bj2017
0 7.283433 2.12276 1 2017 alibori
1 7.283433 2.12276 1 2017 alibori
2 7.283433 2.12276 1 2017 alibori
3 7.283433 2.12276 1 2017 alibori
4 7.283433 2.12276 1 2017 alibori """
df = pd.read_csv(StringIO(x),sep='\s+')
df.groupby('geo_bj2017')['polygam'].sum()
>>> df.groupby('geo_bj2017')['polygam'].sum()
geo_bj2017
alibori 5
Name: polygam, dtype: int64
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
