'Pandas how to delete a cell in a column and shift the column up?
I have a following dataframe:
| Start | Finish |
|---|---|
| 16.11.2021 21:36:31 | 16.11.2021 21:37:31 |
| 16.11.2021 21:39:31 | 16.11.2021 21:40:28 |
| 16.11.2021 21:41:39 | 16.11.2021 21:40:31 |
| 16.11.2021 21:56:31 | 16.11.2021 21:48:31 |
| 16.11.2021 21:50:35 | |
| 16.11.2021 21:58:56 |
I have errors in my dataset - finish time occurs before start time in some rows due to bad data. In rows where that occurs, I'd like to remove the cell in the 'Finish' column and shift other cells up until start time <= finish time. Therefore, the expected output would be:
| Start | Finish |
|---|---|
| 16.11.2021 21:36:31 | 16.11.2021 21:37:31 |
| 16.11.2021 21:39:31 | 16.11.2021 21:40:28 |
| 16.11.2021 21:41:39 | 16.11.2021 21:48:31 |
| 16.11.2021 21:56:31 | 16.11.2021 21:58:56 |
My attempt:
df['start_time'] = pd.to_datetime(df['start_time'])
df['finish_time'] = pd.to_datetime(df['finish_time'])
for i,row in df.iterrows():
if row["start_time"] > row["finish_time"]:
row["finish_time"] = ""
row.dropna()
df.at[i, 'finish_time'] = row["finish_time"]
I appreciate any help in advance.
Solution 1:[1]
You can use pandas.merge_asof.
This is assuming that the values are sorted in both columns (if not, add a sort_values() step).
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])
pd.merge_asof(df['Start'].dropna(), df['Finish'],
left_on='Start', right_on='Finish', direction='forward')
Output:
Start Finish
0 2021-11-16 21:36:31 2021-11-16 21:37:31
1 2021-11-16 21:39:31 2021-11-16 21:40:28
2 2021-11-16 21:41:39 2021-11-16 21:48:31
3 2021-11-16 21:56:31 2021-11-16 21:58:56
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
