'Match new rows within the same table and score the match by #columns matched %

I have an Order table for e.g

OrderId,InvoiceId,ShipToId,Address,City,State,Zip,ProductId,UnitCost, TotalAmount,Tax, OrderDate

I feed the data into this table from multiple sources -> FTP (Excel, csv) API or direct mobile app feeds. Typical daily feed can be around 10,000 records, our DB already has close to a billion+ records

I have a requirement where a new record coming in should try and see if there is a similar record (matched 2 or more columns) and if it does i need to score it and update the score of the new transaction to a percentage based on #of columns matched.

What is the proper way to do this in .net? What tools can i use? Any pointers would be greatly appreciated.

K.

tried looking into AWS Glue, think it is going to be expensive for us. But anything that can help with deduplication and fuzzy matching on columns is what we need i think.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source