'Pandas groupby with unique values and cumulative sum
I have got the following dataframe:
lst=[['2021','01012021','A','AA',2],['2021','01012021','B','BB',4],['2021','01022021','A','AA',6],['2021','01032021','B','BB',8],['2021','01032021','B','BB',10],['2022','01012022','A','AA',3],['2022','01012022','C','CC',6],['2022','01022022','A','AA',9],['2022','01022022','C','CC',12]]
df1=pd.DataFrame(lst,columns=['YEAR','Month','P1','P2','AuM'])
For each year and each month it should accumulate the AuM of all unique P1 and P2. The result should look like this:
lst=[['2021','01012021','A','AA',2],['2021','01012021','B','BB',4],['2021','01022021','A','AA',8],['2021','01022021','B','BB',4],['2021','01032021','A','AA',16],['2021','01032021','B','BB',14],['2022','01012022','A','AA',3],['2022','01012022','C','CC',6],['2022','01022022','A','AA',12],['2022','01022022','C','CC',18]]
df2=pd.DataFrame(lst,columns=['YEAR','Month','P1','P2','AuM_YTD'])`
Even if in the the year 2021, 01.02.2021 there wasn't any AuM for P1=b and P2=bb the AuM was still shown from other months. Thank you for your help.
Solution 1:[1]
The exact grouping rules are unclear (thus the slightly different output), but the general idea is to to use groupby+cumsum:
# month+year # other groups
df1['AuM_YTD'] = df1.groupby([df1['Month'].str[2:] ,'P1', 'P2'])['AuM'].cumsum()
output:
YEAR Month P1 P2 AuM AuM_YTD
0 2021 01012021 A AA 2 2
1 2021 01012021 B BB 4 4
2 2021 01022021 A AA 6 6
3 2021 01032021 B BB 8 8
4 2021 01032021 B BB 10 18
5 2022 01012022 A AA 3 3
6 2022 01012022 C CC 6 6
7 2022 01022022 A AA 9 9
8 2022 01022022 C CC 12 12
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
