'Python: Find max value from set neighbourhood of a point that is closest to a point from another dataset, the first df is very large

I have two columns df1 and df2.

df1 has two columns. I'd like to find the index idx of row from df1, that contains in col1 the minimum distance from the value of df2 col1.

Next, I want to find the max value of df1 col2 from neighborhood [idx-10, idx+10]. Then attach that value to df2 to corresponding row for which the min distance was found.

Here is an example of the two columns:

df1 = pd.DataFrame({"A": [1,1.5,2.5,3,3.5,3.95,4.55],
                    "B": [0.25, 0.6, 1.34, 0.75, 0.16, 1.04, 0.99]})
      A     B
0  1.00  0.25
1  1.50  0.60
2  2.50  1.34
3  3.00  0.75
4  3.50  0.16
5  3.95  1.04
6  4.55  0.99


 df2 = pd.DataFrame({"A1": [1.06,2.34, 3.70],
                     "C": ["pink", "blue", "orange"]})  

  A1       C
0  1.06    pink
1  2.34    blue
2  3.70  orange

Here is the desired result:

df3 = pd.DataFrame({"A1":[1.06,2.34, 3.70],
                    "B": [0.25, 0.6, 1.04],
                    "C": ["pink", "blue", "orange"] })


  A1     B       C
0  1.06  0.25    pink
1  2.34  0.60    blue
2  3.70  1.04  orange

I know that I could merge these data frames, but the problem is that one of them, df1, is very large, so it would take ridiculous amount of memory to compute all possible distances of columns A and A1. So, maybe there is something more efficient than that? I know that this example is not exact, because the column B of the resulting data frame consists of values that exactly match the row of A where A[i]-A1[j] = min|A[i]-A1[j]|,

whereas I need the B value to be the maximum of B from rows where A = [A[i]-5, A[i]+5].



Solution 1:[1]

The logic for the provided output is the fully clear,but if you want to merge on the closest value in A/A1, use pandas.merge_asof:

df3 = pd.merge_asof(df2, df1.rename(columns={'A': 'A1'}),
                    on='A1', direction='nearest')

the condition on idx±10 is also unclear, but if you want to set a maximum distance use tolerance=10

output:

     A1       C     B
0  1.06    pink  0.25
1  2.34    blue  1.34
2  3.70  orange  0.16

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway