'Pandas converting absolute value to percentage of multiple groupby rows [duplicate]
I have a long format df that has an amount column (absolute values) aggregated up to 3 different levels of date, country and group.
import pandas as pd
df = pd.DataFrame.from_dict([{ "date": "2022-02", "country": "Serbia", "group": 3, "amount": 33948 }, { "date": "2021-05", "country": "Thailand", "group": 3, "amount": 15857 }, { "date": "2021-05", "country": "Russia", "group": 2, "amount": 42855 }, { "date": "2021-11", "country": "Ukraine", "group": 3, "amount": 57306 }, { "date": "2021-06", "country": "Poland", "group": 2, "amount": 52898 }, { "date": "2022-02", "country": "Indonesia", "group": 3, "amount": 32330 }, { "date": "2021-11", "country": "Indonesia", "group": 1, "amount": 33791 }, { "date": "2022-02", "country": "China", "group": 3, "amount": 45050 }, { "date": "2021-12", "country": "Indonesia", "group": 1, "amount": 13865 }, { "date": "2022-03", "country": "Sweden", "group": 1, "amount": 45039 }, { "date": "2021-05", "country": "Colombia", "group": 3, "amount": 9363 }, { "date": "2022-01", "country": "Bangladesh", "group": 1, "amount": 47121 }, { "date": "2022-02", "country": "Indonesia", "group": 2, "amount": 18855 }, { "date": "2021-05", "country": "China", "group": 1, "amount": 49383 }, { "date": "2021-06", "country": "Turkmenistan", "group": 3, "amount": 61386 }, { "date": "2021-09", "country": "Kenya", "group": 3, "amount": 40434 }, { "date": "2022-03", "country": "Nicaragua", "group": 3, "amount": 3801 }, { "date": "2022-02", "country": "China", "group": 1, "amount": 39416 }, { "date": "2022-03", "country": "Brazil", "group": 1, "amount": 13657 }, { "date": "2021-05", "country": "Colombia", "group": 2, "amount": 23473 }, { "date": "2022-02", "country": "China", "group": 3, "amount": 19742 }, { "date": "2021-08", "country": "Russia", "group": 2, "amount": 45098 }, { "date": "2022-01", "country": "China", "group": 3, "amount": 15158 }, { "date": "2021-08", "country": "China", "group": 3, "amount": 18376 }, { "date": "2022-01", "country": "Slovenia", "group": 2, "amount": 71213 }, { "date": "2022-02", "country": "Czech Republic", "group": 2, "amount": 32744 }, { "date": "2021-06", "country": "Netherlands", "group": 1, "amount": 42706 }, { "date": "2021-07", "country": "China", "group": 2, "amount": 40277 }, { "date": "2021-08", "country": "United States", "group": 2, "amount": 3070 }, { "date": "2021-07", "country": "Germany", "group": 3, "amount": 17039 }, { "date": "2021-12", "country": "China", "group": 2, "amount": 8714 }, { "date": "2022-01", "country": "Malta", "group": 2, "amount": 44230 }, { "date": "2022-01", "country": "Russia", "group": 3, "amount": 33626 }, { "date": "2021-09", "country": "Greece", "group": 2, "amount": 72860 }, { "date": "2021-08", "country": "China", "group": 1, "amount": 59254 }, { "date": "2022-01", "country": "Japan", "group": 3, "amount": 18136 }, { "date": "2021-08", "country": "Venezuela", "group": 2, "amount": 14065 }, { "date": "2022-01", "country": "China", "group": 2, "amount": 36930 }, { "date": "2022-01", "country": "Honduras", "group": 2, "amount": 768 }, { "date": "2021-08", "country": "Vietnam", "group": 2, "amount": 33652 }, { "date": "2021-07", "country": "Ukraine", "group": 2, "amount": 54050 }, { "date": "2021-09", "country": "Indonesia", "group": 2, "amount": 50304 }, { "date": "2021-10", "country": "Peru", "group": 1, "amount": 27157 }, { "date": "2021-08", "country": "Brazil", "group": 3, "amount": 15869 }, { "date": "2021-11", "country": "Sweden", "group": 1, "amount": 32451 }, { "date": "2021-12", "country": "Mozambique", "group": 2, "amount": 29659 }, { "date": "2022-01", "country": "Argentina", "group": 2, "amount": 25282 }, { "date": "2021-06", "country": "Mongolia", "group": 2, "amount": 63027 }, { "date": "2021-07", "country": "Sudan", "group": 2, "amount": 5006 }, { "date": "2021-08", "country": "United States", "group": 2, "amount": 73414 }, { "date": "2021-05", "country": "China", "group": 3, "amount": 34759 }, { "date": "2021-12", "country": "Brazil", "group": 1, "amount": 636 }, { "date": "2021-06", "country": "Philippines", "group": 2, "amount": 59227 }, { "date": "2021-10", "country": "Russia", "group": 1, "amount": 28537 }, { "date": "2021-08", "country": "China", "group": 3, "amount": 23460 }, { "date": "2022-02", "country": "Philippines", "group": 2, "amount": 62968 }, { "date": "2021-10", "country": "Ukraine", "group": 3, "amount": 63908 }, { "date": "2021-10", "country": "Ukraine", "group": 3, "amount": 38263 }, { "date": "2021-06", "country": "Botswana", "group": 1, "amount": 15918 }, { "date": "2022-02", "country": "Russia", "group": 1, "amount": 31156 }, { "date": "2021-07", "country": "France", "group": 3, "amount": 64077 }, { "date": "2021-07", "country": "China", "group": 1, "amount": 18932 }, { "date": "2022-02", "country": "Russia", "group": 1, "amount": 45279 }, { "date": "2021-07", "country": "Russia", "group": 1, "amount": 7849 }, { "date": "2021-09", "country": "China", "group": 1, "amount": 52640 }, { "date": "2021-07", "country": "Peru", "group": 2, "amount": 19369 }, { "date": "2021-07", "country": "Greece", "group": 1, "amount": 20489 }, { "date": "2021-11", "country": "China", "group": 3, "amount": 30177 }, { "date": "2021-07", "country": "Portugal", "group": 1, "amount": 69521 }, { "date": "2021-06", "country": "Thailand", "group": 3, "amount": 17341 }, { "date": "2021-12", "country": "Peru", "group": 3, "amount": 27012 }, { "date": "2021-12", "country": "Afghanistan", "group": 1, "amount": 34146 }, { "date": "2021-11", "country": "Indonesia", "group": 1, "amount": 57619 }, { "date": "2021-05", "country": "Portugal", "group": 1, "amount": 37319 }, { "date": "2022-01", "country": "Denmark", "group": 1, "amount": 18370 }, { "date": "2022-01", "country": "United States", "group": 3, "amount": 4690 }, { "date": "2021-12", "country": "China", "group": 1, "amount": 35333 }, { "date": "2021-10", "country": "Indonesia", "group": 3, "amount": 74285 }, { "date": "2021-09", "country": "Mexico", "group": 1, "amount": 11260 }, { "date": "2021-11", "country": "Ukraine", "group": 3, "amount": 44389 }, { "date": "2021-11", "country": "France", "group": 3, "amount": 29432 }, { "date": "2021-08", "country": "Ecuador", "group": 1, "amount": 24529 }, { "date": "2021-08", "country": "Democratic Republic of the Congo", "group": 1, "amount": 5211 }, { "date": "2021-12", "country": "Georgia", "group": 3, "amount": 54164 }, { "date": "2021-05", "country": "France", "group": 2, "amount": 9046 }, { "date": "2021-05", "country": "Sweden", "group": 1, "amount": 10326 }, { "date": "2022-02", "country": "Madagascar", "group": 1, "amount": 70109 }, { "date": "2022-01", "country": "China", "group": 1, "amount": 25702 }, { "date": "2021-09", "country": "Poland", "group": 2, "amount": 46625 }, { "date": "2022-01", "country": "Czech Republic", "group": 1, "amount": 23806 }, { "date": "2021-06", "country": "Poland", "group": 2, "amount": 63310 }, { "date": "2021-11", "country": "Poland", "group": 3, "amount": 56290 }, { "date": "2021-12", "country": "Russia", "group": 3, "amount": 45846 }, { "date": "2021-09", "country": "Sweden", "group": 3, "amount": 26358 }, { "date": "2021-09", "country": "Colombia", "group": 2, "amount": 14682 }, { "date": "2021-11", "country": "China", "group": 1, "amount": 65021 }, { "date": "2022-02", "country": "Peru", "group": 1, "amount": 29406 }, { "date": "2022-01", "country": "China", "group": 1, "amount": 57333 }, { "date": "2021-05", "country": "Philippines", "group": 2, "amount": 28340 }, { "date": "2021-10", "country": "Japan", "group": 2, "amount": 37300 }])
example df
date country group amount
0 2022-02 Serbia 3 33948
1 2021-05 Thailand 3 15857
2 2021-05 Russia 2 42855
3 2021-11 Ukraine 3 57306
4 2021-06 Poland 2 52898
...
Date can be any yyyy-mm, Country can be any country, and group can be either 1, 2, 3.
What I would like to do is group by date and country, and then for each group, work out the relative percentage of the grouped date and country.
e.g. take the original df above to get something like (example of one date and one country):
date country group amount
2022-02 Serbia 1 33948
2 34567
3 96787
and then convert the amount to a percentage:
date country group amount_percentage
2022-02 Serbia 1 20.5
2 20.9
3 58.6
before collapsing back to the original format for all date and countries:
date country group amount_percentage
0 2022-02 Serbia 1 20.5
1 2022-02 Serbia 2 20.9
2 2022-02 Serbia 3 58.6
...
My current approach to tackling this is:
df.groupby(['date', 'country', 'group'])['amount'].sum().unstack()
which gives me grouped date and country columns, and the groups becoming column with relative amounts.
date country 1 2 3
2022-02 Serbia 33948 34567 96787
USA 23457 67589 23456
...
However I am not sure how to convert these to percentages of the row total, and then convert the dataframe back to final format. How would you address this last part?
Solution 1:[1]
If I understand correctly, you can do a groupby and then transform('sum') on amount, and divide amount by that:
df['amount_percentage'] = df['amount'] / df.groupby(['date', 'country'], sort=False)['amount'].transform('sum') * 100
Output:
>>> df
date country group amount amount_percentage
0 2022-02 Serbia 3 33948 100.000000
1 2021-05 Thailand 3 15857 100.000000
2 2021-05 Russia 2 42855 100.000000
3 2021-11 Ukraine 3 57306 56.350853
4 2021-06 Poland 2 52898 45.520102
.. ... ... ... ... ...
95 2021-11 China 1 65021 68.300805
96 2022-02 Peru 1 29406 100.000000
97 2022-01 China 1 57333 42.430230
98 2021-05 Philippines 2 28340 100.000000
99 2021-10 Japan 2 37300 100.000000
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | richardec |
