'Pandas sum multi-index columns with same name
I know that I can sum index's by:
df["name1"]+df["name2"]
But how does sum work when the two index names are the same?
Given the following CSV:
,,College 1,,,,,,,,,,,,College 2,,,,,,,,,,,,College 3,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,Music,,,,Geography,,,,Business,,,,Mathematics,,,,Biology,,,,Geography,,,,Business,,,,Biology,,,,Technology,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,F,P,M,D,F,P,M,D,F,P,M,D,F,P,M,D,F,P,M,D,F,P,M,D,F,P,M,D,F,P,M,D,F,P,M,D
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Year 1,M,0,5,7,9,2,18,5,10,4,9,6,2,4,14,18,11,10,19,18,20,3,17,19,13,4,9,6,2,0,10,11,14,4,12,12,5
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,F,0,13,14,11,0,6,8,6,2,12,14,9,9,17,12,18,6,17,16,14,0,4,2,5,2,12,14,9,10,11,18,20,0,5,7,8
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Year 2,M,5,10,6,6,1,20,5,18,4,9,6,2,10,13,15,19,2,18,16,13,1,19,5,12,4,9,6,2,1,13,15,18,3,19,8,16
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,F,1,11,14,15,0,9,9,2,2,12,14,9,7,17,18,14,9,18,13,14,0,9,2,10,2,12,14,9,0,17,19,19,0,4,6,4
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Evening,M,4,10,6,5,3,13,19,5,4,9,6,2,8,17,10,18,3,11,20,11,4,18,17,20,4,9,6,2,8,12,16,13,4,19,18,7
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,F,4,12,12,13,0,9,3,8,2,12,14,9,0,18,11,18,1,13,13,10,0,6,2,8,2,12,14,9,9,16,20,13,0,10,5,6
I can clean the file and setup a multi-index with pandas and numpy:
df = pd.read_csv("CollegeGrades2.csv", index_col=[0,1], header=[0,1,2], skiprows=lambda x: x%2 == 1)
df.columns = pd.MultiIndex.from_frame(df.columns.to_frame().apply(lambda x: np.where(x.str.contains('Unnamed'), np.nan, x)).ffill())
df.index = pd.MultiIndex.from_frame(df.index.to_frame().ffill())
df.groupby(level=0, sort=False).sum()
However my issue is that I want to total the subjects e.g. College 1 Geography + College 3 Geography and display them in the following output:
I have tried separating them out into different data frames, summing them and then concatenating them but in doing so I lose the headings, for example:
music = df2["College 1", "Music"]
geography = df2["College 1", "Geography"] + df2["College 1", "Geography"]
pd.concat([music,geography], axis=1).groupby(level=0, sort=False).sum()
How I sum the subjects while maintaining my desired output? Any help would be appreciated.
Thank you.
Solution 1:[1]
You can also group by the column:
df.groupby(level=[1, 2], axis=1).sum().groupby(level=0).sum()
Result:
1 Biology Business Geography Mathematics Music Technology
2 D F M P D F M P D F M P D F M P D F M P D F M P
0
Evening 47 21 69 52 22 12 40 42 41 7 41 46 36 8 21 35 18 8 18 22 13 4 23 29
Year 1 68 26 63 57 22 12 40 42 34 5 34 45 29 13 30 31 20 0 21 18 13 4 19 17
Year 2 64 12 63 66 22 12 40 42 42 2 21 57 33 17 33 30 21 6 20 21 20 3 14 23
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Code Different |



