'Pandas multi-index divide items by sum of their group
I have a table like below:
I want to change this table so that the values in groupA and groupB are percentages of the totals across each date. For example, the value in groupA for Jim 2023-04-28 should become 13/(44+62). The output table would look like:
I'm unsure how to do this since this is a multi-index table. So far, I have tried manually integrating through all of the dates with a for loop and then storing the sums of the total column in a dictionary, but this seems extremely slow.
Here is the dict of the top data frame (for testing!):
df_a = {'total': {
("5:00",
73482373,
'2023-04-28',
'Jim'): 44.0,
("5:00",
73482373,
'2023-04-28',
'Amy'): 62.0,
("5:00",
73482373,
'2023-04-25',
'Jim'): 80.0,
("5:00",
73482373,
'2023-04-25',
'Amy'): 42.0
},
'groupA': {
("5:00",
73482373,
'2023-04-28',
'Jim'): 13.0,
("5:00",
73482373,
'2023-04-28',
'Amy'): 15.0,
("5:00",
73482373,
'2023-04-25',
'Jim'): 0.0,
("5:00",
73482373,
'2023-04-25',
'Amy'): 12.0
},
'groupB': {
("5:00",
73482373,
'2023-04-28',
'Jim'): 17.0,
("5:00",
73482373,
'2023-04-28',
'Amy'): 21.0,
("5:00",
73482373,
'2023-04-25',
'Jim'): 0.0,
("5:00",
73482373,
'2023-04-25',
'Amy'): 12.0}}
Solution 1:[1]
You could do it using groupby and transform:
df[['groupA', 'groupB']] = df[['groupA', 'groupB']] / df.groupby(level=[0,1,2])['total'].transform('sum').to_numpy()[:, None]
Output:
>>> df
total groupA groupB
5:00 73482373 2023-04-28 Jim 44.0 0.122642 0.160377
Amy 62.0 0.141509 0.198113
2023-04-25 Jim 80.0 0.000000 0.000000
Amy 42.0 0.098361 0.098361
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | richardec |


