'Multiple column closest date match pandas

I have two dataframes as such:

df_a = pd.DataFrame({"col_a": ["a", "a", "b", "b"], "date_a": ["2019-01-03", "2019-02-04", "2020-02-02", "2020-03-15"]})
df_a.date_a = pd.to_datetime(df_a.date_a)
df_a.sort_values("date_a", inplace=True)

And

df_b = pd.DataFrame(
{
    "col_b": ["a", "a", "a", "b", "b", "a", "b", "b"],
    "date_b": [
        "2020-03-10",
        "2019-12-04",
        "2017-05-15",
        "2018-12-04",
        "2020-01-15",
        "2019-01-09",
        "2019-01-10",
        "2020-03-09",
    ],
})

df_b.date_b = pd.to_datetime(df_b.date_b)
df_b.sort_values("date_b", inplace=True)

I would like to merge df_b on df_a by the closest previous date conditional on the first column of each dataframe, with this as the result:

Index col_a date_a col_b date_b
0 a 2019-01-03 a 2017-05-15
1 a 2019-02-04 a 2019-01-09
2 b 2020-02-02 b 2020-01-15
3 b 2020-03-15 b 2020-03-09

Unfortunetely pd.merge_asof does not allow for multiple keys



Solution 1:[1]

Use merge_asof with defult direction='backward' parameter:

df = pd.merge_asof(df_a,
                   df_b, 
                   left_on='date_a', 
                   right_on='date_b', 
                   left_by='col_a', 
                   right_by='col_b')
print (df)
  col_a     date_a col_b     date_b
0     a 2019-01-03     a 2017-05-15
1     a 2019-02-04     a 2019-01-09
2     b 2020-02-02     b 2020-01-15
3     b 2020-03-15     b 2020-03-09

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jezrael