'Match percentage of 2string - spark sql

I have a requirement to check match percentage of 2columns from a table.

For example:

Sample data:

ColA	ColB
AAB	Aab
AACC	Aacc
WER	Wer

from difflib import SequenceMatcher

def similar(a, b):
  return SequenceMatcher( None,a, b).ratio()
spark.udf.register('similar',similar)

Output:

similar('AAB','Aab')
Out[16]: 0.3333333333333333

I am able to achieve the requirement by using sequenceMatcher lib but the issue is I am not able to use that function inside spark sql and facing below error. Is there any other way we can achieve the same??

df=spark.sql(f"""SELECT ColA,ColB,Similar(ColA,ColB) FROM test""")
display(df)

Error: PythonException: 'AttributeError: 'SequenceMatcher' object has no attribute 'matching_blocks'', from , line 4. Full traceback below:

Solution 1:^[1]

• SequenceMatcher accepts strings along with a junk value criteria.

• If any of these input strings are None the error occurs. Empty strings work.

Make sure None inputs are replaced by empty strings.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	AbhishekKhandave-MT

'Match percentage of 2string - spark sql

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]