'Query similar full text in database
I am trying to match an OCR'd letter against a boilerplate template that will be stored in a database. I know how to compare strings to get similarity but for obvious reasons, I don't want to iterate over the entire table to compare every object. Is there a way to create an integer representation of a string so that the db could be queried with a range?
Ideally, I'd be able to take an OCR result of This 1s a string and within a certain threshold, find a stored value in the database of This is a string. The documents will be full page letters.
My second thought is to create a boilerplate table and a child table that has a hash for each line of text. I could iterate through the OCR results and query for the hash value until I have a unique result.
Any suggestions?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
