'Recursive SQL to eliminate duplicates?
I have a SQL, where I need to make assignments between ids, but only if there is exactling one "matching partner".
orig table:
| id_orig | id_new |
|---|---|
| 1 | 2 |
| 3 | 5 |
| 4 | 10 |
| 98 | 200 |
| 100 | 200 |
| 100 | 300 |
First I need to find a way, to update all the pairs, where an id_new references exactly one id_orig:
select id_orig, id_new from mytable QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
this would result in the following resultset (1):
| id_orig | id_new |
|---|---|
| 1 | 2 |
| 3 | 5 |
| 4 | 10 |
| 100 | 300 |
The problem is, that when I now look at the "rest of the set" (which in this case means every id, that wasn't part of the first resultset), there are new rows, which now exactly have one "matching partner".
WITH cte as (
select id_orig, id_new
from mytable
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
)
select id_orig, id_new
from mytable
where id_orig not in (select id_orig from cte)
and id_new not in (select id_new from cte)
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
resultset 2:
| id_orig | id_new |
|---|---|
| 98 | 200 |
In reality, there could be many more steps needed to find all the "matching" partners, so my question is: is it possible to select the rows from resultset 1 and 2 with one query? Maybe recursively?
A brief summary of what I want to achieve:
- Find all rows in the set, where id_new is unique
- ( These ids are marked as "ready" in the source table, and will be ignored next time.)
- Same as step 1: find all rows in the set, where id_new is unique (but ignore id's from the first resultset.. and so on).
Since I don't know how many times I would need to run these steps, until every row is found, I was wondering, if it was possible to select every row with one query.
Solution 1:[1]
I'd try with the following approach:
WITH cte AS (
SELECT id_orig, id_new
FROM mytable
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
),
final AS (
SELECT id_orig, id_new
FROM mytable
LEFT JOIN cte
ON mytable.id_orig = cte.id_orig AND mytable.id_new = cte.id_new
WHERE cte.id_orig IS NULL AND cte.id_new IS NULL
)
SELECT * FROM final
This should return the non-matching fields between the first CTE and the rest of the set in mytable.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Aleix CC |
