'Add intermediate rows in a dataframe based on the previous record

Be the following dataframe:

ID direction country time
0 IN USA 12:10
0 OUT FRA 14:20
0 OUT ESP 16:11
1 IN GER 11:13
1 OUT USA 10:29
2 OUT USA 09:21
2 OUT ESP 21:33

I would like to add the following functionality to the above dataframe:

If there are two rows sequentially with the value of the attribute "direction" equal to OUT for the same ID. An intermediate row is created with the same data of the first OUT row by changing the direction to IN.

Here is an example applied to the above dataframe:

ID direction country time
0 IN USA 12:10
0 OUT FRA 14:20
0 IN FRA 14:20
0 OUT ESP 16:11
1 IN GER 11:13
1 OUT USA 10:29
2 OUT USA 09:21
2 IN USA 09:21
2 OUT ESP 21:33

Thank you for your help.



Solution 1:[1]

Maintain a new dataframe

dfNew = pd.DataFrame()

and loop through each row of the existing dataframe.

for column_name, item in dfOld.iteritems():

Look at the value under direction with every loop, and if it is IN, take that entire row and append it to the new dataframe.

dfNew.append(item, ignore_index=True)

If it is out, add the entire row as above, but also create a new row

dfNew.loc[len(dfNew.index)] = [value1, value2, value3, ...]

or edit the existing row (contained in item) and add it to the new dataframe as well.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 M B