'Pandas create counter column for group but reset count based on multiple conditions
I have the following Dataframe:
Worker dt_diff same_employer same_role
1754 0 days 00:00:00 False False
2951 0 days 00:00:00 False False
2951 1 days 00:00:00 True True
2951 1 days 01:00:00 True True
3368 0 days 00:00:00 False False
3368 7 days 00:00:00 True True
3368 7 days 00:00:00 True True
3368 7 days 00:00:00 True True
3368 7 days 00:00:00 True True
3368 7 days 00:00:00 True True
3539 0 days 00:00:00 False False
3539 1 days 00:00:00 True True
3539 1 days 00:00:00 True True
3539 3 days 00:30:00 False False
3539 1 days 00:00:00 True True
3539 2 days 06:00:00 False True
I would like to create a new column containing continuity counter grouped by worker. However the counter will be based on the following conditions:
if (dt_diff > 6days) or (same_employer == False) or (same_role == False) then reset the counter
So for the above dataframe i would expect result as below:
Worker Counter
1754 1
2951 3
3368 1
3539 3
Solution 1:[1]
I expect your expected counter for the worker 3539 to be 1 because the last row should have reset it.
Your condition:
s = ~((df['dt_diff'].dt.days > 6) | (df['same_employer'] == False) | (df['same_role'] == False))
The key is to count from the last row up to the last row that does not satisfy your condition, and we can create a mask for that by:
y = s[::-1].groupby(df['Worker']).cumprod()
then we sum over the mask, but adding 1 at last
print(y.groupby(df['Worker']).sum()+1)
Worker
1754 1
2951 3
3368 1
3539 1
dtype: int64
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Raymond Kwok |
