'Insert new row in a DataFrame if condition is met
I have a dataframe with two date columns:
Brand Start Date Finish Date Check
0 1 2013-03-16 2014-03-02 Consecutive
1 2 2014-03-03 2015-09-05 Consecutive
2 3 2015-12-12 2016-12-12 Non Consecutive
3 4 2017-01-01 2017-06-01 Non consecutive
4 5 2017-06-02 2019-02-20 Consecutive
I created a new column (check column) checking if the start date is consecutive of the finish date in the previous row, which I populated with 'Consecutive' and 'Non consecutive'.
I want to insert a new row where the value of the check column is 'Non consecutive' that contains, as Start date, the date of the column 'finish date' + 1 day (consecutive with previous row) and as 'finish date' the date of the column Finish Date - 1 day (consecutive with next row). So indexes 2 and 4 will be the new rows
Brand Start Date Finish Date
0 1 2013-03-16 2014-03-02
1 2 2014-03-03 2015-09-05
2 3 2015-09-06 2015-12-11
3 3 2015-12-12 2016-12-12
4 4 2016-12-13 2016-12-31
5 4 2017-01-01 2017-06-01
6 5 2017-06-02 2019-02-20
How can I achieve this?
Solution 1:[1]
date_format = '%Y-%m-%d'
rows = df.index[~df['Consecutive']]
df2 = pd.DataFrame(columns=df.columns, index=rows)
res = []
for row in rows:
df2.loc[row, :] = df.loc[row, :].copy()
df2.loc[row, 'Start Date'] = datetime.strftime(datetime.strptime(df.loc[row, 'Start Date'], date_format) + timedelta(days=1), date_format)
df2.loc[row, 'Finish Date'] = datetime.strftime(datetime.strptime(df.loc[row+1, 'Finish Date'], date_format) - timedelta(days=1), date_format)
df3 = pd.concat([df, df2]).sort_values(['Brand', 'Start Date']).reset_index(drop=True)
This uses sorting to put the rows in the correct place. If your df is big the sorting could be the slowest part and you could consider adding the rows one at a time into the correct place see here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | SiP |
