'Having difficulty generating distinct rows without duplicates in a SELF JOIN (SQL)

I'm pretty new to SQL and thought I'd be able to use it to create a list for my employer's clients. Unfortunately, there the majority of clients have more than one account and the file has a distinct row for each account.

I was trying to use a self join to create one row for each client with multiple columns for the accounts.

SELECT DISTINCT A.Account_Number AS Account_1, B.Account_Number AS Account_2, A.Client_Name
FROM client_table AS A, client_table AS B
WHERE A.Account_Number <> B.Account_Number
AND A.Client_Name = B.Client_Name
ORDER BY A.Client_Name;

Unfortunately, the results were such that I would get a table that looks like:

Account_1 Account_2 Client_name
000001 000002 Joe Shmo
000001 000003 Joe Shmo
000002 000003 Joe Shmo
000002 000001 Joe Shmo

I understand that for more than two accounts, I'll need more than two joins, but I haven't figured out how to do it.

Is there a way to prevent double entry?

I'm using BigQuery btw.



Solution 1:[1]

You can do this with LEFT JOIN. Repeat as necessary for however many accounts a user may have.

SELECT DISTINCT ct.Client_Name, ct1.Account_Number AS Account_1, ct2.Account_Number AS Account_2,
                ct3.Account_Number AS Account_3, ct4.Account_Number AS Account_4
FROM client_table ct
LEFT JOIN client_table ct1 ON ct1.Client_Name = ct.Client_Name AND ct1.Account_Number = (SELECT MIN(Account_Number) FROM client_table WHERE Client_Name = ct.Client_Name)
LEFT JOIN client_table ct2 ON ct2.Client_Name = ct.Client_Name AND ct2.Account_Number = (SELECT MIN(Account_Number) FROM client_table WHERE Client_Name = ct.Client_Name AND Account_Number > ct1.Account_Number)
LEFT JOIN client_table ct3 ON ct3.Client_Name = ct.Client_Name AND ct3.Account_Number = (SELECT MIN(Account_Number) FROM client_table WHERE Client_Name = ct.Client_Name AND Account_Number > ct2.Account_Number)
LEFT JOIN client_table ct4 ON ct4.Client_Name = ct.Client_Name AND ct4.Account_Number = (SELECT MIN(Account_Number) FROM client_table WHERE Client_Name = ct.Client_Name AND Account_Number > ct3.Account_Number)

Solution 2:[2]

You can use GROUP_CONCAT() to fetch a comma seperated list.
Or you can use RANK()to order your accounts in a CTEand then list them.
Attention : any 7th accounts or above will not be shown!

create table clients (
cname varchar(10),
account char(6));
insert into clients values
('Joe Shmo','000001'),
('Joe Shmo','000002'),
('Joe Shmo','000003'),
('Joe Shmo','000004');
select 
   cname "Name", 
   group_concat(account ) account_numbers
from clients
group by cname
order by cname;

Name     | account_numbers            
:------- | :--------------------------
Joe Shmo | 000001,000002,000003,000004
with c as 
(select
cname,
account,
rank() over (partition by cname order by account) ranking
from clients)
select 
coalesce(max(case when ranking=1 then account end),'   -') account1,
coalesce(max(case when ranking=2 then account end),'   -') account2,
coalesce(max(case when ranking=3 then account end),'   -') account3,
coalesce(max(case when ranking=4 then account end),'   -') account4,
coalesce(max(case when ranking=5 then account end),'   -') account5,
coalesce(max(case when ranking=6 then account end),'   -') account6
from c
group by cname
account1 | account2 | account3 | account4 | account5 | account6
:------- | :------- | :------- | :------- | :------- | :-------
000001   | 000002   | 000003   | 000004   |    -     |    -    

db<>fiddle here

Please see the following post for more information on group_concat in bigQuery: BigQuery GROUP_CONCAT and ORDER BY

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2