'Best practice to find emails and phones related in a customers table
Let say I have the following customers table:
| row | Phone | group | |
|---|---|---|---|
| 1 | [email protected] | +135698887445 | 1 |
| 2 | [email protected] | +145698834549 | 2 |
I would like to group customers that have the same email or same phone (or both).
The tricky part is that with incremental loading once a new customer row is been added, earlier identities that grouped to different groupIds could now be joined to the same group like so:
| row | Phone | group | |
|---|---|---|---|
| 1 | [email protected] | +135698887445 | 1 |
| 2 | [email protected] | +145698834549 | 2--changed to group 1 |
| 3 | [email protected] | +145698834549 | 1 |
In the last example you can notice that the third row has email from the first row and cellphone from the second row so it links the above rows to the same group.
How should this be done in TSQL? And is it possible to keep the groupids numbers while changing early group number only when needed?
Solution 1:[1]
First, define that if multiple rows are actually of the same group, the smallest group will prevail. We then just need to loop updates until no group decrease is found:
declare @rc bigint=1 -- to go through first iteration check
declare @counter_safety_max=100
declare @counter=1
while(@rc>0 and @counter<@counter_safety_max)
begin
set @counter+=1
update current
set group=lowest.group
from
customer current
cross apply
(
select top 1 group
from customer cl
where (cl.phone=current.phone or cl.email=current.email) and cl.group<current.group
order by group asc
) as lowest
set @rc=@@rowcount
end
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | George Menoutis |
