'Pandas groupby with unique values and cumulative sum

I have got the following dataframe:

lst=[['2021','01012021','A','AA',2],['2021','01012021','B','BB',4],['2021','01022021','A','AA',6],['2021','01032021','B','BB',8],['2021','01032021','B','BB',10],['2022','01012022','A','AA',3],['2022','01012022','C','CC',6],['2022','01022022','A','AA',9],['2022','01022022','C','CC',12]]
df1=pd.DataFrame(lst,columns=['YEAR','Month','P1','P2','AuM'])

For each year and each month it should accumulate the AuM of all unique P1 and P2. The result should look like this:

lst=[['2021','01012021','A','AA',2],['2021','01012021','B','BB',4],['2021','01022021','A','AA',8],['2021','01022021','B','BB',4],['2021','01032021','A','AA',16],['2021','01032021','B','BB',14],['2022','01012022','A','AA',3],['2022','01012022','C','CC',6],['2022','01022022','A','AA',12],['2022','01022022','C','CC',18]]
df2=pd.DataFrame(lst,columns=['YEAR','Month','P1','P2','AuM_YTD'])`

Even if in the the year 2021, 01.02.2021 there wasn't any AuM for P1=b and P2=bb the AuM was still shown from other months. Thank you for your help.



Solution 1:[1]

The exact grouping rules are unclear (thus the slightly different output), but the general idea is to to use groupby+cumsum:

                              # month+year        # other groups
df1['AuM_YTD'] = df1.groupby([df1['Month'].str[2:] ,'P1', 'P2'])['AuM'].cumsum()

output:

   YEAR     Month P1  P2  AuM  AuM_YTD
0  2021  01012021  A  AA    2        2
1  2021  01012021  B  BB    4        4
2  2021  01022021  A  AA    6        6
3  2021  01032021  B  BB    8        8
4  2021  01032021  B  BB   10       18
5  2022  01012022  A  AA    3        3
6  2022  01012022  C  CC    6        6
7  2022  01022022  A  AA    9        9
8  2022  01022022  C  CC   12       12

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway