'Python programming finding similar names from a list of names

I am using a dataset of company names with that may contains not identical duplicates.

The list may contains : company A but also c.o.m.p.a.n.y A or comp A

Is there any python script using NLP for example that can find similar names from a dataset.

Thanks in advance



Solution 1:[1]

You can use spacy to get similarities between 2 texts.

import spacy

nlp = spacy.load("en_core_web_md")  # make sure to use larger package!
doc1 = nlp("Coca-Cola")
doc2 = nlp("Pepsi")

doc3 = nlp("Company Coca-Cola")
doc4 = nlp("Company Pepsi-Cola")


print(doc1, "<->", doc2, doc1.similarity(doc2))
print(doc3, "<->", doc4, doc3.similarity(doc4))

With following similarities

Coca-Cola <-> Pepsi 0.6684898494102074
Company Coca-Cola <-> Company Pepsi-Cola 0.934960639746236

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 PleSo