'Optimize the traversal of a column of a dataframe

I want to check for fuzzy duplicates in a column of the dataframe using fuzzywuzzy. In this case, I have to iterate over the rows one by one using two nested for loops.

for i in df['col']:
    for j in df['col']:
       ratio = fuzz.ratio(i, j)
       if ratio > 90:
           print("row duplicates")

Except that my dataframe contains 600 000 rows, and this code has a complexity of 0(n²). Is there a lighter way of doing this?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source