'How to calculate cumulative sum (reversed) of a Python DataFrame within given groups?
I have a data frame (df_f) with many (n=19) columns that, if conceptually simplified looks something like this:
| Basin | (n=17 columns) | Chi |
|---|---|---|
| 13.0 | ... | 4 |
| 13.0 | ... | 8 |
| 13.0 | ... | 2 |
| 21.0 | ... | 4 |
| 21.0 | ... | 6 |
| 38.0 | ... | 1 |
| 38.0 | ... | 7 |
| 38.0 | ... | 2 |
| 38.0 | ... | 4 |
The real data frame has around 70,000 rows and around 60 unique 'Basin' ID values (and of course other columns with numerical data, but we don't need them for this, I still want to keep them though for plotting).
What I would like is the following:
I want calculate the cumulative sum of the 'Chi' value, but reversed (so, going up), AND I want it to be reset for each Basin 'group'. So the values should be:
| Basin | (n=17 columns) | Chi_cum |
|---|---|---|
| 13.0 | ... | 14 |
| 13.0 | ... | 10 |
| 13.0 | ... | 2 |
| 21.0 | ... | 10 |
| 21.0 | ... | 6 |
| 38.0 | ... | 14 |
| 38.0 | ... | 13 |
| 38.0 | ... | 6 |
| 38.0 | ... | 4 |
As you can see, in Chi_cum, we progressively add the Chi values going up, but we reset the sum for each 'Basin'.
I know how to do a reverse cumulative sum (df_f['Chi_cum'] = df_f.loc[::-1, 'Chi'].cumsum()[::-1]), but that is for the entire dataframe. Conversely, I know how to use '.groupby', but I can't seem to combine both methods of grouping and summing.
I tried doing something like:
df_f["Chi_cum"] = df_f.groupby(by=['Basin']).sum().iloc[::-1].groupby(level=[0]).cumsum().iloc[::-1]
(as inspired by Pandas Python Groupby Cummulative Sum Reverse), but it does not work!
Could someone please help me figure this out? Thanks!
Solution 1:[1]
You can try with series groupby
df['new'] = df.loc[::-1, 'Chi'].groupby(df['Basin']).cumsum()
df
Out[858]:
Basin (n=17 columns) Chi new
0 13.0 ... 4 14
1 13.0 ... 8 10
2 13.0 ... 2 2
3 21.0 ... 4 10
4 21.0 ... 6 6
5 38.0 ... 1 14
6 38.0 ... 7 13
7 38.0 ... 2 6
8 38.0 ... 4 4
Solution 2:[2]
Apparently you can't actually add ascending=True or reverse=True to cumsum (!?!?), so just reverse before and after cumsum for each group:
df['Chi'] = df.groupby('Basin')['Chi'].apply(lambda x: x[::-1].cumsum()[::-1])
Output:
>>> df
Basin Chi
0 13.0 14
1 13.0 10
2 13.0 2
3 21.0 10
4 21.0 6
5 38.0 14
6 38.0 13
7 38.0 6
8 38.0 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | richardec |
| Solution 2 |
