'Select rows together with their previous rows given a condition for a pandas dataframe
Given a DataFrame, where the "time" attribute is a Python timedelta object. The index attribute is internal to the pandas dataframe, I simply show it to make it easier to understand the problem.
| index | ID | time | country |
|---|---|---|---|
| 0 | 0 | NaT | ITL |
| 1 | 0 | 0 days 01:00:00 | USA |
| 2 | 0 | NaT | ITL |
| 3 | 0 | NaT | ESP |
| 4 | 0 | 2 days 00:05:00 | USA |
| 5 | 1 | NaT | ITL |
| 6 | 1 | 0 days 03:35:00 | USA |
| 7 | 1 | NaT | ESP |
| 8 | 2 | NaT | USA |
| 9 | 3 | NaT | FRA |
| 10 | 3 | 2 days 04:14:10 | ESP |
| 11 | 3 | 1 days 03:35:00 | ITL |
I divide the functionality in 2 steps:
- Select the rows whose value of the attribute "time" is different from NaT.
| index | ID | time | country |
|---|---|---|---|
| 1 | 0 | 0 days 01:00:00 | USA |
| 4 | 0 | 2 days 00:05:00 | USA |
| 6 | 1 | 0 days 03:35:00 | USA |
| 10 | 3 | 2 days 04:14:10 | ESP |
| 11 | 3 | 1 days 03:35:00 | ITL |
- Select for each row of the above subset, its previous index row as long as it belongs to the same ID.
| index | ID | time | country |
|---|---|---|---|
| 0 | 0 | NaT | ITL |
| 1 | 0 | 0 days 01:00:00 | USA |
| 3 | 0 | NaT | ESP |
| 4 | 0 | 2 days 00:05:00 | USA |
| 5 | 1 | NaT | ITL |
| 6 | 1 | 0 days 03:35:00 | USA |
| 9 | 3 | NaT | FRA |
| 10 | 3 | 2 days 04:14:10 | ESP |
| 11 | 3 | 1 days 03:35:00 | ITL |
The DataFrame I want is the one above (without the index column, which is internal to pandas). I have split the functionality into 2 steps to make the problem understandable, however I need to get to step 2, it is not necessary that the functionality of step 1 comes implicit.
Thanks for your help in advance.
Solution 1:[1]
Chain masks for testing non missing values with shifted values per groups by DataFrameGroupBy.shift with | for bitwise OR in boolean indexing:
m = df['time'].notna()
df = df[m | m.groupby(df['ID']).shift(-1,fill_value=False)]
print (df)
ID time country
0 0 NaT ITL
1 0 0 days 01:00:00 USA
3 0 NaT ESP
4 0 2 days 00:05:00 USA
5 1 NaT ITL
6 1 0 days 03:35:00 USA
9 3 NaT FRA
10 3 2 days 04:14:10 ESP
11 3 1 days 03:35:00 ITL
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
