'how to lookup approximate in DataFrame Pandas?
I have a lab with 2 dataframes as below:
df1 = df1.head(10)
HT D
0 A BL
1 B 0
2 C NaN
3 D 7
4 E 3
5 F NaN
6 G 2
rng.head(10)
Range Group
0 0 group A
1 3 Group B
2 5 Group C
3 8 Group D
4 11 Group E
how I can find approximate value from column df['D'] in column rng['Range'] , sth like vlookup approximate in Excel. My output expected as below:
HT D Group
0 A BL
1 B 0 GroupA
2 C NaN
3 D 7 GroupC
4 E 3 GroupB
5 F NaN
6 G 2 GroupA
Solution 1:[1]
You can use merge_asof to create an intermediate, then join on the index. Your particular case is made more complex as you have mixed types in 'D', so you first need to subselect the numbers.
s = pd.to_numeric(df1['D'], errors='coerce')
df2 = pd.merge_asof(s[s.notna()].astype(int).reset_index(), rng,
left_on='D', right_on='Range')
out = df1.join(df2.drop(columns='D').set_index('index'))
output:
HT D Range Group
0 A BL NaN NaN
1 B 0 0.0 group A
2 C NaN NaN NaN
3 D 2 0.0 group A
4 E 3 3.0 Group B
5 F NaN NaN NaN
6 G 7 5.0 Group C
Solution 2:[2]
You could define a function that returns the closest group for a given value of d and then use that to assign a group to every row in df1
def find_group(d):
if np.isnan(d):
return ''
return rng.iloc[(rng['Range']-d).abs().argmin()]['Group']
df2 = df1.copy()
df2['Group'] = pd.to_numeric(df2['D'], errors='coerce').apply(lambda x: find_group(x))
This produces the output:
>>> print(df2)
HT D Group
0 A BL
1 B 0 groupA
2 C NaN
3 D 2 GroupB
4 E 3 GroupB
5 F NaN
6 G 7 GroupD
EDIT: I believe the Excel lookup doesn't match the closest row but rather the largest value that is less than the lookup value. You can implement that logic in the find_group function as well:
def find_group(d):
if np.isnan(d):
return ''
return rng[(d >= rng['Range'])].iloc[-1]['Group']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
| Solution 2 |
