'How do I calculate similarity between two words to detect if they are duplicates?

I have two words and I want to calculate the similarity between them in order to rank them if they are duplicates or not.

How do I achieve that using deep learning / NLP methods?



Solution 1:[1]

Here's a few approaches to tackle text similarity

String-based approaches

Neural-based approaches

Machine Translation based approaches


But before you consider which library to use to measure similarity, you should try to define what do you want to measure when it comes to similarity,

Are you trying to find semantic similarity with syntactic difference?

  • The dog ate the biscuit vs
  • The biscuit was eaten by the dog

Are you trying to find lexical semantic similarity?

  • This problem is driving me mad! vs
  • This problem is making me angry!

Are you trying to find entailment instead of similarity?

  • I ate Chinese food for dinner vs
  • I ate kungpao chicken for dinner

The ambiguity of "similarity" becomes even more complex when comparing individual words without context, e.g.

  • plant vs factory

    • They can be similar, if plant refers to industrial plant
    • But they are dis-similar if plant refers to the living thing plant
  • bank vs financial institute

    • They can be similar if bank refers to the place we deposit or withdraw cash
    • But they are dis-similar if bank refers to the river bank.

There are many other aspect of similarity that one can define depending on the ultimate task that you want to do with the similarity score.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 alvas