'Recursive SQL to eliminate duplicates?

I have a SQL, where I need to make assignments between ids, but only if there is exactling one "matching partner".

orig table:

id_orig id_new
1 2
3 5
4 10
98 200
100 200
100 300

First I need to find a way, to update all the pairs, where an id_new references exactly one id_orig:

select id_orig, id_new from mytable QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1

this would result in the following resultset (1):

id_orig id_new
1 2
3 5
4 10
100 300

The problem is, that when I now look at the "rest of the set" (which in this case means every id, that wasn't part of the first resultset), there are new rows, which now exactly have one "matching partner".

WITH cte as (
select id_orig, id_new
from mytable
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
) 
select id_orig, id_new
from mytable
where id_orig not in (select id_orig from cte)
and id_new not in (select id_new from cte)
QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1

resultset 2:

id_orig id_new
98 200

In reality, there could be many more steps needed to find all the "matching" partners, so my question is: is it possible to select the rows from resultset 1 and 2 with one query? Maybe recursively?

A brief summary of what I want to achieve:

  1. Find all rows in the set, where id_new is unique
  2. ( These ids are marked as "ready" in the source table, and will be ignored next time.)
  3. Same as step 1: find all rows in the set, where id_new is unique (but ignore id's from the first resultset.. and so on).

Since I don't know how many times I would need to run these steps, until every row is found, I was wondering, if it was possible to select every row with one query.



Solution 1:[1]

I'd try with the following approach:

WITH cte AS (
  SELECT id_orig, id_new
  FROM mytable
  QUALIFY COUNT(*) OVER (PARTITION BY id_new) = 1
),
final AS (
  SELECT id_orig, id_new
  FROM mytable
  LEFT JOIN cte 
    ON mytable.id_orig = cte.id_orig AND mytable.id_new = cte.id_new
  WHERE cte.id_orig IS NULL AND cte.id_new IS NULL
)
SELECT * FROM final

This should return the non-matching fields between the first CTE and the rest of the set in mytable.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Aleix CC