'how to lookup approximate in DataFrame Pandas?

I have a lab with 2 dataframes as below:

df1 = df1.head(10)
        HT  D
    0   A   BL
    1   B   0
    2   C   NaN
    3   D   7
    4   E   3
    5   F   NaN
    6   G   2


rng.head(10)
    Range   Group
0   0       group A
1   3       Group B
2   5       Group C
3   8       Group D
4   11      Group E

how I can find approximate value from column df['D'] in column rng['Range'] , sth like vlookup approximate in Excel. My output expected as below:

    HT  D    Group
0   A   BL   
1   B   0    GroupA
2   C   NaN
3   D   7    GroupC
4   E   3    GroupB
5   F   NaN  
6   G   2    GroupA


Solution 1:[1]

You can use merge_asof to create an intermediate, then join on the index. Your particular case is made more complex as you have mixed types in 'D', so you first need to subselect the numbers.

s = pd.to_numeric(df1['D'], errors='coerce')
df2 = pd.merge_asof(s[s.notna()].astype(int).reset_index(), rng,
                    left_on='D', right_on='Range')

out = df1.join(df2.drop(columns='D').set_index('index'))

output:

  HT    D  Range    Group
0  A   BL    NaN      NaN
1  B    0    0.0  group A
2  C  NaN    NaN      NaN
3  D    2    0.0  group A
4  E    3    3.0  Group B
5  F  NaN    NaN      NaN
6  G    7    5.0  Group C

Solution 2:[2]

You could define a function that returns the closest group for a given value of d and then use that to assign a group to every row in df1

def find_group(d):
    if np.isnan(d):
        return ''
    return rng.iloc[(rng['Range']-d).abs().argmin()]['Group']
df2 = df1.copy()
df2['Group'] = pd.to_numeric(df2['D'], errors='coerce').apply(lambda x: find_group(x))

This produces the output:

>>> print(df2)
  HT    D   Group
0  A   BL        
1  B    0  groupA
2  C  NaN        
3  D    2  GroupB
4  E    3  GroupB
5  F  NaN        
6  G    7  GroupD

EDIT: I believe the Excel lookup doesn't match the closest row but rather the largest value that is less than the lookup value. You can implement that logic in the find_group function as well:

def find_group(d):
    if np.isnan(d):
        return ''
    return rng[(d >= rng['Range'])].iloc[-1]['Group']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway
Solution 2