'Multiple column closest date match pandas
I have two dataframes as such:
df_a = pd.DataFrame({"col_a": ["a", "a", "b", "b"], "date_a": ["2019-01-03", "2019-02-04", "2020-02-02", "2020-03-15"]})
df_a.date_a = pd.to_datetime(df_a.date_a)
df_a.sort_values("date_a", inplace=True)
And
df_b = pd.DataFrame(
{
"col_b": ["a", "a", "a", "b", "b", "a", "b", "b"],
"date_b": [
"2020-03-10",
"2019-12-04",
"2017-05-15",
"2018-12-04",
"2020-01-15",
"2019-01-09",
"2019-01-10",
"2020-03-09",
],
})
df_b.date_b = pd.to_datetime(df_b.date_b)
df_b.sort_values("date_b", inplace=True)
I would like to merge df_b on df_a by the closest previous date conditional on the first column of each dataframe, with this as the result:
| Index | col_a | date_a | col_b | date_b |
|---|---|---|---|---|
| 0 | a | 2019-01-03 | a | 2017-05-15 |
| 1 | a | 2019-02-04 | a | 2019-01-09 |
| 2 | b | 2020-02-02 | b | 2020-01-15 |
| 3 | b | 2020-03-15 | b | 2020-03-09 |
Unfortunetely pd.merge_asof does not allow for multiple keys
Solution 1:[1]
Use merge_asof with defult direction='backward' parameter:
df = pd.merge_asof(df_a,
df_b,
left_on='date_a',
right_on='date_b',
left_by='col_a',
right_by='col_b')
print (df)
col_a date_a col_b date_b
0 a 2019-01-03 a 2017-05-15
1 a 2019-02-04 a 2019-01-09
2 b 2020-02-02 b 2020-01-15
3 b 2020-03-15 b 2020-03-09
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
