'Moving aggregation based on conditions (dataframe)
I have the following data:
| Material | Plant | Date | count | backwards cumulative of count | total |
|---|---|---|---|---|---|
| ACID | 1800 | 2021-07-01 | 1 | 3 | 100 |
| ACID | 1800 | 2021-09-01 | 1 | 2 | 200 |
| ACID | 1800 | 2021-10-01 | 1 | 1 | 300 |
| ACID | 1820 | 2021-09-01 | 2 | 9 | 400 |
| ACID | 1820 | 2021-10-01 | 2 | 7 | 500 |
| ACID | 1820 | 2021-11-01 | 2 | 5 | 200 |
| ACID | 1820 | 2021-12-01 | 3 | 3 | 100 |
I need to get the sum total value for each Material and Plant based on the condition that the cumulative should be > 1, and that the value we get is the most recent date that adheres to this condition.
This is the output I should get:
| Material | Plant | date | total |
|---|---|---|---|
| ACID | 1800 | 2021-09-01 | 500 |
| ACID | 1820 | 2021-12-01 | 100 |
The first row is the sum of dates 2021-09-01 and 2021-10-01.
I can get the rows where the cumulative count is above 1, and I know I have to use a groupby and max function in between, but I'm just not sure how.
select_indices = list(np.where(df2["backwards cumulative of count"] > 1)[0])
df2.iloc[select_indices]
Another way to do it is simply removing the irrelevant rows, so we would end up with:
| Material | Plant | Date | count | backwards cumulative of count | total |
|---|---|---|---|---|---|
| ACID | 1800 | 2021-09-01 | 1 | 2 | 200 |
| ACID | 1800 | 2021-10-01 | 1 | 1 | 300 |
| ACID | 1820 | 2021-12-01 | 3 | 3 | 100 |
and then do the aggregation.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
