'Python Dataframe find closest matching value with a tolerance

I have a data frame consisting of lists as elements. I want to find the closest matching values within a percentage of a given value. My code:

df = pd.DataFrame({'A':[[1,2],[4,5,6]]})
df
           A
0     [1, 2]
1  [3, 5, 7]

# in each row, lets find a the values and their index that match 5 with 20% tolerance 
val = 5
tol = 0.2 # find values matching 5 or 20% within 5 (4 or 6)
df['Matching_index'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)

Present solution:

df
           A     Matching_index
0     [1, 2]     1                # 2 matches closely with 5 but this is wrong
1  [4, 5, 6]     1                # 5 matches with 5, correct.

Expected solution:

df
           A     Matching_index
0     [1, 2]     NaN              # No matching value, hence NaN
1  [4, 5, 6]     1                # 5 matches with 5, correct.

Solution 1:^[1]

Idea is get difference with val and then replace to missing values if not match tolerance, last get np.nanargmin which raise error if all missing values, so added next condition with np.any:

def f(x):
    a = np.abs(np.array(x)-val)
    m = a <= val * tol
    return np.nanargmin(np.where(m, a, np.nan)) if m.any() else np.nan
    
df['Matching_index']  = df['A'].map(f)

print (df)
           A  Matching_index
0     [1, 2]             NaN
1  [4, 5, 6]             1.0

Pandas solution:

df1 = pd.DataFrame(df['A'].tolist(), index=df.index).sub(val).abs()

df['Matching_index'] = df1.where(df1 <= val * tol).dropna(how='all').idxmin(axis=1)

Solution 2:^[2]

I'm not sure it you want all indexes or just a counter.

Try this:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[[1,2],[4,5,6,7,8]]})

val = 5
tol = 0.3

def closest(arr,val,tol):
    idxs = [ idx for idx,el in enumerate(arr) if (np.abs(el - val) < val*tol)]
    result = len(idxs) if len(idxs) != 0 else np.nan
    return result

df['Matching_index'] = df['A'].apply(closest, args=(val,tol,))
df

If you want all the indexes, just return idxs instead of len(idxs).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2

'Python Dataframe find closest matching value with a tolerance

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]