'Compare dates from two different dataframes
is there any optimum way to achieve this solution?
Email column from df1 needs to run through email column of df2 and if a match is found then needs to check that df1.CreationDate is between df2.StartDate and df2.StartDate+3months. Both dataframes contain ~200000 records.
df1:
BillNum Email CreationDate
0 101 [email protected] 24-Mar-2022
1 102 [email protected] 10-May-2019
2 103 [email protected] 20-Mar-2022
df2:
RefNum Email StartDate
0 13 [email protected] 01-Mar-2022
1 12 [email protected] 15-Mar-2022
2 11 [email protected] 12-Feb-2022
Output df:
BillNum Email CreationDate RefNum StartDate
0 101 [email protected] 24-Mar-2022 12 15-Mar-2022
Solution 1:[1]
I think you can do a merge and then filter:
df1['CreationDate'] = pd.to_datetime(df1['CreationDate'])
df2['StartDate'] = pd.to_datetime(df2['StartDate'])
tmp = df1.merge(df2)
tmp = tmp[(tmp.CreationDate > tmp.StartDate) & (tmp.CreationDate < tmp.StartDate + pd.offsets.MonthBegin(3))]
Output:
>>> tmp
BillNum Email CreationDate RefNum StartDate
0 101 [email protected] 2022-03-24 12 2022-03-15
Solution 2:[2]
After merge you can use timedelta with days as argument to filter dates.
import datetime
df3=df1.merge(df2)
df3[(df3['CreationDate']>=df3['StartDate']) & (df3['CreationDate']-df3['StartDate']<datetime.timedelta(90))]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | richardec |
| Solution 2 | Y U |
