'Compare dates from two different dataframes

is there any optimum way to achieve this solution?

Email column from df1 needs to run through email column of df2 and if a match is found then needs to check that df1.CreationDate is between df2.StartDate and df2.StartDate+3months. Both dataframes contain ~200000 records.

df1:

   BillNum         Email                 CreationDate
0    101       [email protected]              24-Mar-2022
1    102       [email protected]               10-May-2019
2    103       [email protected]               20-Mar-2022

df2:


   RefNum      Email                     StartDate                
0    13        [email protected]             01-Mar-2022              
1    12        [email protected]            15-Mar-2022              
2    11        [email protected]             12-Feb-2022 

   
Output df:

   BillNum       Email        CreationDate  RefNum     StartDate
0   101      [email protected]     24-Mar-2022    12      15-Mar-2022


Solution 1:[1]

I think you can do a merge and then filter:

df1['CreationDate'] = pd.to_datetime(df1['CreationDate'])
df2['StartDate'] = pd.to_datetime(df2['StartDate'])
tmp = df1.merge(df2)
tmp = tmp[(tmp.CreationDate > tmp.StartDate) & (tmp.CreationDate < tmp.StartDate + pd.offsets.MonthBegin(3))]

Output:

>>> tmp
   BillNum          Email CreationDate  RefNum  StartDate
0      101  [email protected]   2022-03-24      12 2022-03-15

Solution 2:[2]

After merge you can use timedelta with days as argument to filter dates.

import datetime
df3=df1.merge(df2)
df3[(df3['CreationDate']>=df3['StartDate']) & (df3['CreationDate']-df3['StartDate']<datetime.timedelta(90))]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 richardec
Solution 2 Y U