'Find duplicates by LEVENSHTEIN distance
I'm trying to get duplicates with two conditions :
- the standard groupBy method (by name and type)
- and also by similarities on the name field.
In all articles about this, they always find similarities from a string, but here i want to compare with all others records on the same table.
So far I have tried the following query which retrieves the distance well but does not allow to group the results on the distance :
ERROR: window functions are not allowed in HAVING LINE
$query->select('name', 'type', DB::raw('COUNT(*) as count'), DB::raw('LEVENSHTEIN(UPPER(name),UPPER(lag(name) OVER (order by name))) as distance'))
->groupBy('name', 'type')
->havingRaw('COUNT(*) > 1')
->orHavingRaw('LEVENSHTEIN(UPPER(name),UPPER(lag(name) OVER (order by name))) > 20');
How to get duplicates using the levenshtein distance on the same field over all the table (postgresql)?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
