'Having difficulty generating distinct rows without duplicates in a SELF JOIN (SQL)
I'm pretty new to SQL and thought I'd be able to use it to create a list for my employer's clients. Unfortunately, there the majority of clients have more than one account and the file has a distinct row for each account.
I was trying to use a self join to create one row for each client with multiple columns for the accounts.
SELECT DISTINCT A.Account_Number AS Account_1, B.Account_Number AS Account_2, A.Client_Name
FROM client_table AS A, client_table AS B
WHERE A.Account_Number <> B.Account_Number
AND A.Client_Name = B.Client_Name
ORDER BY A.Client_Name;
Unfortunately, the results were such that I would get a table that looks like:
| Account_1 | Account_2 | Client_name |
|---|---|---|
| 000001 | 000002 | Joe Shmo |
| 000001 | 000003 | Joe Shmo |
| 000002 | 000003 | Joe Shmo |
| 000002 | 000001 | Joe Shmo |
I understand that for more than two accounts, I'll need more than two joins, but I haven't figured out how to do it.
Is there a way to prevent double entry?
I'm using BigQuery btw.
Solution 1:[1]
You can do this with LEFT JOIN. Repeat as necessary for however many accounts a user may have.
SELECT DISTINCT ct.Client_Name, ct1.Account_Number AS Account_1, ct2.Account_Number AS Account_2,
ct3.Account_Number AS Account_3, ct4.Account_Number AS Account_4
FROM client_table ct
LEFT JOIN client_table ct1 ON ct1.Client_Name = ct.Client_Name AND ct1.Account_Number = (SELECT MIN(Account_Number) FROM client_table WHERE Client_Name = ct.Client_Name)
LEFT JOIN client_table ct2 ON ct2.Client_Name = ct.Client_Name AND ct2.Account_Number = (SELECT MIN(Account_Number) FROM client_table WHERE Client_Name = ct.Client_Name AND Account_Number > ct1.Account_Number)
LEFT JOIN client_table ct3 ON ct3.Client_Name = ct.Client_Name AND ct3.Account_Number = (SELECT MIN(Account_Number) FROM client_table WHERE Client_Name = ct.Client_Name AND Account_Number > ct2.Account_Number)
LEFT JOIN client_table ct4 ON ct4.Client_Name = ct.Client_Name AND ct4.Account_Number = (SELECT MIN(Account_Number) FROM client_table WHERE Client_Name = ct.Client_Name AND Account_Number > ct3.Account_Number)
Solution 2:[2]
You can use GROUP_CONCAT() to fetch a comma seperated list.
Or you can use RANK()to order your accounts in a CTEand then list them.
Attention : any 7th accounts or above will not be shown!
create table clients ( cname varchar(10), account char(6)); insert into clients values ('Joe Shmo','000001'), ('Joe Shmo','000002'), ('Joe Shmo','000003'), ('Joe Shmo','000004');
select cname "Name", group_concat(account ) account_numbers from clients group by cname order by cname;
Name | account_numbers :------- | :-------------------------- Joe Shmo | 000001,000002,000003,000004
with c as (select cname, account, rank() over (partition by cname order by account) ranking from clients) select coalesce(max(case when ranking=1 then account end),' -') account1, coalesce(max(case when ranking=2 then account end),' -') account2, coalesce(max(case when ranking=3 then account end),' -') account3, coalesce(max(case when ranking=4 then account end),' -') account4, coalesce(max(case when ranking=5 then account end),' -') account5, coalesce(max(case when ranking=6 then account end),' -') account6 from c group by cname
account1 | account2 | account3 | account4 | account5 | account6 :------- | :------- | :------- | :------- | :------- | :------- 000001 | 000002 | 000003 | 000004 | - | -
db<>fiddle here
Please see the following post for more information on group_concat in bigQuery: BigQuery GROUP_CONCAT and ORDER BY
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
