'pandas groupby, counting unique in a group and summing across groups
I have a df as follows:
Date Stock Category
2020-01-01 AAA 1
2020-01-01 BBB 2
2020-01-01 CCC 1
2020-01-02 AAA 1
2020-01-02 BBB 2
What I am looking to achieve is sum by Date and then cumsum across Date while grouping by Category
So the result would look like
Date Category CUMSUM
2020-01-01 1 2
2020-01-01 2 1
2020-01-02 1 3
2020-01-02 2 2
Essentially on 2020-01-01 2 Stocks were in Category 1 and on 2020-01-02 1 Stock was in Category 1 resulting in cumsum=3 on 2020-01-02.
I have tried df.groupby(['Date','Category']).agg({'STOCK':'nunique'}).groupby(level=0).cumsum().reset_index() but this resulted in each Category having the same CUMSUM on any given day.
Any ideas on how I can fix this?
Thanks!
Solution 1:[1]
You pretty much had it, but in your second groupby operation you'll want to group on "Category" in to calculate a cumulative sum across your dates.
An implicit assumption is that your dates will be sorted after the first groupby operation. So make sure your keep sort=True (the default) for your first groupby operation. OR ensure your data is sorted by date prior to the groupby opertaion if you want to specify sort=False for any reason.
out = (
df.groupby(["Date", "Category"])
["Stock"].nunique()
.groupby("Category")
.cumsum()
)
print(out)
Date Category
2020-01-01 1 2
2 1
2020-01-02 1 3
2 2
Name: Stock, dtype: int64
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
