'How do I calculate similarity between two words to detect if they are duplicates?
I have two words and I want to calculate the similarity between them in order to rank them if they are duplicates or not.
How do I achieve that using deep learning / NLP methods?
Solution 1:[1]
Here's a few approaches to tackle text similarity
String-based approaches
Neural-based approaches
Machine Translation based approaches
- https://github.com/mjpost/sacrebleu/tree/master/sacrebleu
- https://github.com/Unbabel/MT-Telescope
- https://github.com/alvations/lightyear
But before you consider which library to use to measure similarity, you should try to define what do you want to measure when it comes to similarity,
Are you trying to find semantic similarity with syntactic difference?
The dog ate the biscuitvsThe biscuit was eaten by the dog
Are you trying to find lexical semantic similarity?
This problem is driving me mad!vsThis problem is making me angry!
Are you trying to find entailment instead of similarity?
I ate Chinese food for dinnervsI ate kungpao chicken for dinner
The ambiguity of "similarity" becomes even more complex when comparing individual words without context, e.g.
plantvsfactory- They can be similar, if
plantrefers to industrial plant - But they are dis-similar if
plantrefers to the living thing plant
- They can be similar, if
bankvsfinancial institute- They can be similar if
bankrefers to the place we deposit or withdraw cash - But they are dis-similar if
bankrefers to the river bank.
- They can be similar if
There are many other aspect of similarity that one can define depending on the ultimate task that you want to do with the similarity score.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | alvas |
