'Checking and filtering the first event of each group in Pandas

I have an event dataset. First of all, I grouped my dataset by user_id and location with pandas library. Then I order my each group by created_at. I want to delete all group if the group does not start with 'Open' event.

My dataset:

user_id location event_name created_at
A1 AAA Event5 9/1/2020 9:06:52 PM
A1 AAA Event6 9/1/2020 9:07:02 PM
A1 AAA Open 9/1/2020 9:07:12 PM
A1 AAA Event1 9/1/2020 9:07:22 PM
A1 AAA Event2 9/1/2020 9:07:32 PM
A1 AAA Event3 9/1/2020 9:07:42 PM
A1 AAA Event2 9/1/2020 9:07:52 PM
A1 AAA Open 9/1/2020 9:08:02 PM
A1 AAA Event1 9/1/2020 9:08:12 PM
A1 AAA Event2 9/1/2020 9:08:22 PM
A2 AAA Event9 9/1/2020 9:07:22 PM

I want:

user_id location event_name created_at
A1 AAA Open 9/1/2020 9:07:12 PM
A1 AAA Event1 9/1/2020 9:07:22 PM
A1 AAA Event2 9/1/2020 9:07:32 PM
A1 AAA Event3 9/1/2020 9:07:42 PM
A1 AAA Open 9/1/2020 9:08:02 PM
A1 AAA Event1 9/1/2020 9:08:12 PM
A1 AAA Event2 9/1/2020 9:08:22 PM

Thank you.



Solution 1:[1]

In short:

df[~(df['event_name'] != 'Open').groupby([df['user_id'], df['location']]).cumprod()]

Explanation:

(df['event_name'] != 'Open') gives us a series of True/False which can also be understood as 1/0 when interpreted as numbers. cumprod apply cumulative product to the True/False series so that as soon as it sees an False (i.e. 0 or 'Open'), it returns a zero and remain zero thereafter.

So we may use zero as the indicator of rows that we wanna to keep.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1