'pandas groupby cummax just assigning original values instead of updating the max-so-far
I have this dataframe:
type run corrected_episode Reward
0 notsweet 0 0 35.0
1 notsweet 0 100 20.0
2 notsweet 0 200 20.0
3 notsweet 0 300 22.0
4 notsweet 0 400 20.0
I want to create a new column, best_so_far, that has a monotonically increasing value for the corresponding Reward grouped by type, run, and corrected_episode. Easy enough, right? Except the following happens when I try to use groupby and cummax:
foo['best_so_far'] = foo.groupby(['type','run','corrected_episode']).Reward.cummax() yields:
type run corrected_episode Reward best_so_far
0 notsweet 0 0 35.0 35.0
1 notsweet 0 100 20.0 20.0
2 notsweet 0 200 20.0 20.0
3 notsweet 0 300 22.0 22.0
4 notsweet 0 400 20.0 20.0
The "best so far", well, isn't the best. I get the same results if I use foo['best_so_far'] = foo.groupby(['type','run','corrected_episode']).Reward.apply(lambda x: x.cummax())
I know this is possible because I've done this dozens of times with other dataframes, there's just something weird about this one that this simple procedure just doesn't work.
Solution 1:[1]
You can try remove corrected_episode
foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()
Solution 2:[2]
After posting this of course I discovered what happened, but I'm going to share what I did to fix this here because this is the kind of Violation of the Principle of Least Astonishment that pandas is prone to.
The solution was to do this, instead:
foo['best_so_far'] = foo.groupby(['type','run']).Reward.cummax()
That is, I over specified the columns by including corrected_episode that had the unintended effect of just executing cummax() for that one element. However, I had originally included corrected_episode to ensure that the order of the rows was correct -- i.e., the dataframe was actually the result of massaging a lot of data (you are seeing a teeny tiny subset), and the order of the data wasn't necessarily sane for the cummax() to work as I envisioned.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BENY |
| Solution 2 | Mark Coletti |
