'Need help in forming a SQL query
We have 2 tables called tbl1 and tbl2. It contains columns such as Visit_ID, Customer ID, and so on. There are instances where a Visit_ID will be associated with multiple Customer IDs.
For example, if customer logs into a website, a unique Visit_ID will be generated for each time he visits the website.
In one visit, multiple customers can login to their accounts and make individual purchases.
There are instances where a visit will be associated to multiple customer IDs. If there are more than 2 instances, append any other retail customer ID instances in this column.
For instances there are visit, which had 200 Customer IDs attached to that visit.
For example, if there are 7 Customer IDs in 1 visit, for Customer 1, it should have the first customer 1. For Customer 2, we will need to display the 2nd customer ID.
For 3rd to 7, all those 5 will be comma separated.
Can someone help how to frame a SQL query using this logic?
with CTE as (
SELECT
visit_id,
B.visitpg_nbr::INT AS visitpg_nbr,
CUSTOMER_ID,
dense_rank()over( PARTITION BY VISIT_ID order by CUSTOMER_ID) as rank
from
db_name.schema_name.tbl_1 A
JOIN db_name.schema_name.tbl_2 B
ON B.id_column = A.id_column
JOIN db_name.schema_name.tbl_3 C
ON CAST(C.xid as VARCHAR)= A.CUSTOMER_ID
WHERE flg_col = '0'
AND so_cd NOT IN ('0','1','2','3')
AND DATE_COL = '2022-01-17'
and visit_id='12345'
ORDER BY visitpg_nbr
)
select VISIT_ID, arr[0], arr[1], array_to_string( array_slice(arr, 2, 99999), ', ')
from (
select VISIT_ID, array_agg(distinct CUSTOMER_ID) within group(order by CUSTOMER_ID) arr
from CTE
group by 1
);
Thanks for those who have responded. I really appreciate their guidance. The logic worked fine. When I'm joining 3 tables inside CTE, I'm getting lot of duplicates. I want to eliminate the duplicate values.
When I run the below query that I have included inside CTE, I'm getting records which are duplicates.
SELECT
visit_id,
B.visitpg_nbr::INT AS visitpg_nbr,
CUSTOMER_ID,
dense_rank()over( PARTITION BY VISIT_ID order by CUSTOMER_ID) as rank
from
db_name.schema_name.tbl_1 A
JOIN db_name.schema_name.tbl_2 B
ON B.id_column = A.id_column
JOIN db_name.schema_name.tbl_3 C
ON CAST(C.xid as VARCHAR)= A.CUSTOMER_ID
WHERE flg_col = '0'
AND so_cd NOT IN ('0','1','2','3')
AND DATE_COL = '2022-01-17'
and visit_id='12345'
ORDER BY visitpg_nbr
Row VISIT_ID CUSTOMER_ID VISITPG_NBR RANK
**1 12345 100 1 1**
2 12345 100 2 1
3 12345 100 3 1
4 12345 100 4 1
5 12345 100 5 1
**6 67891 101 6 2**
7 67891 101 7 2
8 67891 101 8 2
9 67891 101 9 2
10 67891 101 10 2
**11 78910 102 11 3**
12 78910 102 12 3
13 78910 102 13 3
14 78910 102 14 3
Is there any logic to display the distinct results in the CTE temp table?
The final result should be populated as below.
VISIT_ID First_Customer Second_Customer Other_Customers
1 100 101 102,103,104,105,106
2 200 201 202,203,204,205
First Customer_ID should get displayed in the First_Customer column, Second_Customer_Id should get displayed in Second_Customer column.. All the other customer_ids should be displayed in the final column and it should be comma separated.
Also, I wanted the results to be ordered by visitpg_nbr
Solution 1:[1]
You should be able to get this with array_agg(), and then choosing the first, second, and subsequent (array_slice()) elements:
with data as (
select *
from snowflake_sample_data.tpch_sf100.orders
where o_custkey between 5411266 and 5411290
)
select o_custkey, arr[0], arr[1], array_to_string(array_slice(arr, 2, 99999), ', ')
from (
select o_custkey, array_agg(o_orderkey) within group(order by o_orderdate) arr
from data
group by 1
);
You might need to get unique ids in case there are many, you can solve that with a subquery before array_agg().
Solution 2:[2]
slightly different to Felipe's answer, not sure which would be more performant. I suspect his, but anyways here is another way to try it.
SELECT visit_id, first_customer, second_customer
,array_agg(other_ids) within group (order by order_id) as other_customer
FROM(
SELECT visit_id,
order_id,
first_value(customer_id) over (partition by visit_id order by order_id) as first_customer,
first_value(customer_id) over (partition by visit_id order by order_id) as second_customer,
IFF(row_number() over (partition by visit_id order by order_id) > 2, customer_id, null) as other_ids
FROM VALUES
(1,100, 1),
(1,101, 2),
(1,102, 3),
(1,103, 5),
(1,104, 6),
(1,105, 6),
(1,106, 7),
(2,200, 1),
(2,201, 2),
(2,202, 3),
(2,203, 4)
v(visit_id, customer_id, order_id)
)
GROUP BY 1,2,3
ORDER BY 1,2,3;
| VISIT_ID | FIRST_CUSTOMER | SECOND_CUSTOMER | OTHER_CUSTOMER |
|---|---|---|---|
| 1 | 100 | 100 | [ 102, 103, 104, 105, 106 ] |
| 2 | 200 | 200 | [ 202, 203 ] |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Felipe Hoffa |
| Solution 2 | Simeon Pilgrim |

