'Compare two string and produce the result with the score of how much they match

I want to be able to validate that two objects match with a certain degree, let's say from 0 to 1.

So assume there is a Person object:

Person
{
   Guid Id {get;set;}
   string FirstName {get;set;}
   string LastName {get;set;}
   int Age {get ;set;}
}

Now, I have a database with verified Person object details. For every transaction, a new person data is aquired through OCR processing (like from credit card or other document), so it's likely it will be a little different than the full data stored in a database - there might be missing characters, additional namespaces, etc.

What I want is to be able to compare the data aquired from OCR and match it to a specific person from a database (I already know which person object it is; no need to validate against all the records) with a certain score. I could then test the data and set some threshold that would be fine for me in terms of the person validation.

Initially, I thought I could use ElasticSearch to do this (never used it before, did some reading/research today about this), like I would index verified person objects and then I could just search for a given customer with rules like: it must have matching id and should match with FirstName and should match with LastName, etc. (and add some weights/boosting for a particular fields). But the more I think about it, the more it looks like overengineering to me. Thus, I have a few questions:

  1. Is ElasticSearch a correct tool to approach the problem, since I already know which verified person I want to fetch from the source to compare agains the OCR result, all I need is an algorithm to evaluate the relevance of that match?
  2. If above statement is true, what ES API should I use? I mean, I've been searching for full-text matching, multi-field matching with query boosting and many others, but I'm not sure which (if any) would be good to address my problem.
  3. If ES is not a good solution for this, is there any other open source tool/library that would be better to solve this kind of a problem? I'm working with .NET environment.


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source