'Improve performance when fuzzy matching values in one list with values in another in python

I have an issue where I need to take a list from an input, compare it to a list of relevant values, and if the value is relevant place the value in a new list to be output. I've found due to the data not being 100% accurate I've got to fuzzy match each value from the input string with each value in the desired string, check the score and then append it to the output list.

So far I have created this function (I've added comments for clarity):

def refine_attributes(row, policy_list):
    """
    Fucntion to identify the relevant attributes and refine the reported attibutes to only relenvant ones.
    :param row:
    :param policy_list:
    :return:
    """
    all_attributes = [] # this forms the desired list of relevant values
    for p in policy_list:
        value = next(iter(p))
        attributes_list = p.get(value, {}).get('attributes')
        for a in attributes_list:
            all_attributes.append(a) # This is retrieving a nested list field in a list of dictionaries and adding each value separately to the new desired list.
        # Further refinement and test performed on the desireable list.
         all_attributes_processed = [] 
    for i in all_attributes:
        i = i.replace('"', '').strip()
        # Test the attribute is legitimate
        if len(i) > 1:
            all_attributes_processed.append(i)
        else:
            continue
######### The steps above here will be moved to a seperate function to be create an object to refer to instead of doing so for each row. ############
    
    new_attributes = [] # this is the output list
    current_attributes = row['attribute_original_name'] # this is the input list
    current_attributes = current_attributes.replace('[','').replace(']','').replace("'", "").split(',') # this is a bit of preprocessing on the input list as it is given as a string in the input

### This is the section where each string in the lists are compared and scored
    for a in current_attributes:
        for attr in all_attributes_processed:
            ratio = fuzz.partial_ratio(a, attr)
            if ratio > 90:
                new_attributes.append(attr)
    return new_attributes

The issue with the above is that it is not very performant. I'm sure I can work in a lambda function here but I'm unable to see how best to do it. Any suggestions to speed this up would be greatly appreciated.

PS: The lists are usually only up to 20 strings long at most but this needs to occur for every row in a data frame that is hundreds of thousands in length.

PPS: This function is called in a lambda function as follows:

df['attribute_original_name'] = df.apply(
        lambda row: refine_attributes(row, p_list), axis = 1
    )

I've seen this thread and wonder if I need to create a data frame here too: Python Fuzzy matching strings in list performance Is this needing to be a row for a comparison for every string in the inout list to every string in the desired list?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Improve performance when fuzzy matching values in one list with values in another in python

Sources

Related Questions