'Merging 2 dataframes and sorting by datetime Pandas Python
I want to produce a code where it creates an additional table to the dataframe data. The new dataframe data2 will have the following changes:
- label will be
Newinstead ofOld col1's last index will be deletedcol2's first index will be deleteddatewill be first index will be deleted and all date values will be subtracted by 1 minute
Then I want to concatenate the two data frames to make one data frame called merge I want to sort the dataframe by dates. Since the first index of data2 is dropped the order of merge should be in order of label: New, Old, New, Old. How can I subtract 1 minute from date_mod and merge the two data frames in order of dates?
import pandas as pd
d = {'col1': [4, 5, 2, 2, 3, 5, 1, 1, 6], 'col2': [6, 2, 1, 7, 3, 5, 3, 3, 9],
'label':['Old','Old','Old','Old','Old','Old','Old','Old','Old'],
'date': ['2022-01-24 10:07:02', '2022-01-27 01:55:03', '2022-01-30 19:09:03', '2022-02-02 14:34:06',
'2022-02-08 12:37:03', '2022-02-10 03:07:02', '2022-02-10 14:02:03', '2022-02-11 00:32:25',
'2022-02-12 21:42:03']}
data = pd.DataFrame(d)
'''
Additional Dataframe
label will have New
'col1'`s last index will be deleted
'col2'`s first index will be deleted
'date' will be first index will be deleted and all date values will be subtracted by 1 minute
'''
a = data['col1'].drop(data['col1'].index[-1])
b = data['col2'].drop(data['col2'].index[0])
# subtract the date_mod by 1 minute
date_mod = pd.to_datetime(data['date'][1:])
data2 = pd.DataFrame({'col1':a,'col2':b,
'label':['New','New','New','New','New','New','New','New'],
'date': date_mod})
'''
Merging data and data2
Sort by 'date'
Should go in order as Old, New, Old, New ...
The length of the columns are 1 less than of data bc of the dropped indexes
'''
merge=pd.merge(data,displayer)
Solution 1:[1]
the simplest way I think off, - place all adjustments into the function and apply to the copy of the original dataframe, later simply concat and sort:
data.date = pd.to_datetime(data.date) # converting column date str values to datetime to deduct 1minute later
def adjust_data(df):
df['col1'] = df['col1'].drop(df['col1'].index[-1])
df['col2'] = df['col2'].drop(df['col2'].index[0])
df.date = df.date - pd.Timedelta(minutes=1) # subtract the datetime by 1 minute
df.label = df.label.replace('Old','New') # change values in the column "label"
data2 = data.copy()
adjust_data(data2) # apply function to data2
# concat both dataframes and sort by column "date"
merge = pd.concat([data,data2], axis=0).sort_values(by=['date']).reset_index(drop=True)
print(merge)
out:
col1 col2 label date
0 4.0 NaN New 2022-01-24 10:06:02
1 4.0 6.0 Old 2022-01-24 10:07:02
2 5.0 2.0 New 2022-01-27 01:54:03
3 5.0 2.0 Old 2022-01-27 01:55:03
4 2.0 1.0 New 2022-01-30 19:08:03
5 2.0 1.0 Old 2022-01-30 19:09:03
6 2.0 7.0 New 2022-02-02 14:33:06
7 2.0 7.0 Old 2022-02-02 14:34:06
8 3.0 3.0 New 2022-02-08 12:36:03
9 3.0 3.0 Old 2022-02-08 12:37:03
10 5.0 5.0 New 2022-02-10 03:06:02
11 5.0 5.0 Old 2022-02-10 03:07:02
12 1.0 3.0 New 2022-02-10 14:01:03
13 1.0 3.0 Old 2022-02-10 14:02:03
14 1.0 3.0 New 2022-02-11 00:31:25
15 1.0 3.0 Old 2022-02-11 00:32:25
16 NaN 9.0 New 2022-02-12 21:41:03
17 6.0 9.0 Old 2022-02-12 21:42:03
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
