'How to find values of one dataframe in another?

I have two dataframes. The values in the first dataframe are supposed to be found or retrieved from the second dataframe. The problem is, while the values contained in df1 is autogenerated from an application, the values in df2 are manually entered. To retrieve values of df1 in df2, I take into consideration the unique fields that should match a search, which are

  1. Name
  2. id
  3. date

The above fields work only for values or data that had been inputted on the same date. But because df2 has been manually inputed, the same row value in df1(data_today) might be found in row df2(date_tomorow).

Here is an example code

    import pandas as pd

    df1 = pd.DataFrame([
    ['Aa', 1.2, '26-1-2022'],
    ['Bb', 2.2, '27-1-2022'],
    ['Bb', 2.3, '28-1-2022'],
    ['Cc', 3.2, '26-1-2022']
    ], columns=['name', 'id', 'date'])

    df2 = pd.DataFrame([
    ['Aa', 1.2, '26-1-2022'],
    ['Bb', 2.2, '27-1-2022'],
    ['Bb', 2.3, '29-1-2022'],
    ['Cc', 3.2, '29-1-2022']
    ], columns=['name', 'id', 'date'])

    selected_rows = pd.DataFrame()

    for i in df1.index:
    name_i = df1._get_value(i, 'name')
    id_i = df1._get_value(i, 'id')
    date_i = df1._get_value(i, 'date')
    for j in df2.index:
        name_j = df2._get_value(j, 'name')
        id_i = df2._get_value(j, 'id')
        date_j = df2._get_value(j, 'date')

        if name_i == name_j and id_i == id_i and date_i == date_j:
            selected_rows = selected_rows.append(df1.loc[i])
            break

    print(selected_rows)

How can I extend my code to also include the other dates? Thanks.



Solution 1:[1]

You can include a date condition where the difference between the date from df1 and df2 doesn't surpass a specific number of days example 3, 4, or whatever you choose.

from datetime import datetime

def days_between(d1, d2):
    d1 = datetime.strptime(d1, "%Y-%m-%d")
    d2 = datetime.strptime(d2, "%Y-%m-%d")
    return abs((d2 - d1).days)

and in your code update this part:

if name_i == name_j and id_i == id_i and days_between(date_i,date_j)<4:
   selected_rows = selected_rows.append(df1.loc[i])
   break

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 alphaBetaGamma