'Add intermediate rows in a dataframe based on the previous record
Be the following dataframe:
| ID | direction | country | time |
|---|---|---|---|
| 0 | IN | USA | 12:10 |
| 0 | OUT | FRA | 14:20 |
| 0 | OUT | ESP | 16:11 |
| 1 | IN | GER | 11:13 |
| 1 | OUT | USA | 10:29 |
| 2 | OUT | USA | 09:21 |
| 2 | OUT | ESP | 21:33 |
I would like to add the following functionality to the above dataframe:
If there are two rows sequentially with the value of the attribute "direction" equal to OUT for the same ID. An intermediate row is created with the same data of the first OUT row by changing the direction to IN.
Here is an example applied to the above dataframe:
| ID | direction | country | time |
|---|---|---|---|
| 0 | IN | USA | 12:10 |
| 0 | OUT | FRA | 14:20 |
| 0 | IN | FRA | 14:20 |
| 0 | OUT | ESP | 16:11 |
| 1 | IN | GER | 11:13 |
| 1 | OUT | USA | 10:29 |
| 2 | OUT | USA | 09:21 |
| 2 | IN | USA | 09:21 |
| 2 | OUT | ESP | 21:33 |
Thank you for your help.
Solution 1:[1]
Maintain a new dataframe
dfNew = pd.DataFrame()
and loop through each row of the existing dataframe.
for column_name, item in dfOld.iteritems():
Look at the value under direction with every loop, and if it is IN, take that entire row and append it to the new dataframe.
dfNew.append(item, ignore_index=True)
If it is out, add the entire row as above, but also create a new row
dfNew.loc[len(dfNew.index)] = [value1, value2, value3, ...]
or edit the existing row (contained in item) and add it to the new dataframe as well.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | M B |
