'Is there any alternative of pandas' merge_asof when joining on multiple columns?
I need to merge 2 data frames on 3 key columns, one contains strings, one contains integers and the last contains floats. On the string and integer columns, I would need an exact match, while on the float column I want to get the values of the right data frame for the nearest corresponding float key in the left data frame. I thought that pandas' merge_asof function would be good, but unfortunately, it can be used only for a single, numerical column to join on.
I have approached this problem with cross merge, but unfortunately, it won't work for the real problem, as each data frame has ~2 million rows. (and it's not giving the correct values either...)
import pandas as pd
import numpy as np
import itertools
df1 = pd.DataFrame(data = itertools.product(['a', 'b', 'c'],
[0, 1, 2, 3, 4],
[0, 0.25, 0.5, 0.75, 1]),
columns = ['key1', 'key2', 'key3'])
df2 = df1.copy()
df2['key3'] = np.random.rand(len(df2))
df2['values'] = np.random.rand(len(df2))
merged = df1.merge(df2, how = 'cross')
merged['key3_diff'] = np.abs(merged['key3_x'] - merged['key3_y'])
aux = merged[['key1_x', 'key2_x',
'key3_x', 'key3_diff']].groupby(['key1_x',
'key2_x',
'key3_x']).min().reset_index()
res = merged[merged['key1_x'].isin(aux['key1_x']) &
merged['key2_x'].isin(aux['key2_x']) &
merged['key3_diff'].isin(aux['key3_diff'])][['key1_x',
'key2_x',
'key3_x',
'values']].rename(columns = {'key1_x': 'key1',
'key2_x': 'key2',
'key3_x': 'key3'})
Is there any alternative besides implementing this calculation sequentially (maybe with numba and jit)?
Solution 1:[1]
This is a solution using pandas's merge_asof. I didn't realize the 'by' argument. This is equivalent with the accepted answer.
df1 = df1.sort_values(by = 'key3')
df2 = df2.sort_values(by = 'key3')
res3 = pd.merge_asof(df1, df2, by = ['key1', 'key2'], on = 'key3', direction = 'nearest')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
