'Different aggfunc based on different logics for pandas pivot table
I would like to apply different "aggfunc" logics to a pandas pivot table. Lets suppose that I have the below df.
df1 = pd.DataFrame({'Country':['Italy', 'Italy', 'Italy', 'Germany','Germany', 'Germany', 'France', 'France'],
'City':['Rome','Rome',"Florence",'Berlin', 'Munich', 'Koln', "Paris", "Paris"],
'Numbers':[100,200,300,400,500,600,700,800]})
I would like to calculate the sum of "Numbers" per City and the mean of "Numbers" based on the Country. I should get the below output.
I must use the pd.pivot. But if you have better solutions, you can ALSO suggest that.
Would you be able to help me out?
| Country | City | SUM | MEAN |
|---|---|---|---|
| France | Paris | 1500 | 750 |
| Germany | Berlin | 400 | 500 |
| Germany | Köln | 600 | 500 |
| Germany | Munich | 500 | 500 |
| Italy | Florence | 300 | 200 |
| Italy | Rome | 300 | 200 |
I have tried using the following but it obviously does not work.
pd.pivot_table(df1, values = 'Numbers', index=['Country', 'City'], aggfunc=[np.sum, np.mean])
Solution 1:[1]
use GroupBy.transform
new_df = \
df1.assign(
SUM = df1.groupby('City', sort=False)['Numbers'].transform('sum'),
MEAN = df1.groupby('Country', sort=False)['Numbers'].transform('mean')
).drop_duplicates(['Country', 'City']).drop('Numbers', axis=1)
Country City SUM MEAN
0 Italy Rome 300 200
1 Italy Rome 300 200
2 Italy Florence 300 200
3 Germany Berlin 400 500
4 Germany Munich 500 500
5 Germany Koln 600 500
6 France Paris 1500 750
7 France Paris 1500 750
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ansev |
