'Need help in forming a SQL query

We have 2 tables called tbl1 and tbl2. It contains columns such as Visit_ID, Customer ID, and so on. There are instances where a Visit_ID will be associated with multiple Customer IDs.

For example, if customer logs into a website, a unique Visit_ID will be generated for each time he visits the website.

In one visit, multiple customers can login to their accounts and make individual purchases.

There are instances where a visit will be associated to multiple customer IDs. If there are more than 2 instances, append any other retail customer ID instances in this column.

For instances there are visit, which had 200 Customer IDs attached to that visit.

For example, if there are 7 Customer IDs in 1 visit, for Customer 1, it should have the first customer 1. For Customer 2, we will need to display the 2nd customer ID.

For 3rd to 7, all those 5 will be comma separated.

Can someone help how to frame a SQL query using this logic?

   with CTE as (
    SELECT
    visit_id,
    B.visitpg_nbr::INT AS visitpg_nbr,
     CUSTOMER_ID,
  dense_rank()over( PARTITION BY VISIT_ID order by CUSTOMER_ID) as rank
from
    db_name.schema_name.tbl_1 A
    JOIN db_name.schema_name.tbl_2 B
    ON B.id_column  = A.id_column 
    JOIN db_name.schema_name.tbl_3 C
    ON CAST(C.xid as VARCHAR)= A.CUSTOMER_ID
    WHERE flg_col = '0'
    AND so_cd NOT IN ('0','1','2','3')
    AND DATE_COL = '2022-01-17'
    and visit_id='12345'
      ORDER BY visitpg_nbr
    )
    
    select  VISIT_ID, arr[0], arr[1], array_to_string( array_slice(arr, 2, 99999), ', ')
    from (
    select VISIT_ID, array_agg(distinct CUSTOMER_ID) within group(order by CUSTOMER_ID) arr
    from CTE
    group by 1
    );

Thanks for those who have responded. I really appreciate their guidance. The logic worked fine. When I'm joining 3 tables inside CTE, I'm getting lot of duplicates. I want to eliminate the duplicate values.

When I run the below query that I have included inside CTE, I'm getting records which are duplicates.

SELECT
        visit_id,
        B.visitpg_nbr::INT AS visitpg_nbr,
         CUSTOMER_ID,
      dense_rank()over( PARTITION BY VISIT_ID order by CUSTOMER_ID) as rank
    from
        db_name.schema_name.tbl_1 A
        JOIN db_name.schema_name.tbl_2 B
        ON B.id_column  = A.id_column 
        JOIN db_name.schema_name.tbl_3 C
        ON CAST(C.xid as VARCHAR)= A.CUSTOMER_ID
        WHERE flg_col = '0'
        AND so_cd NOT IN ('0','1','2','3')
        AND DATE_COL = '2022-01-17'
        and visit_id='12345'
          ORDER BY visitpg_nbr

Row      VISIT_ID    CUSTOMER_ID          VISITPG_NBR     RANK 
**1         12345      100                      1                1**
2         12345      100                      2                1
3         12345      100                      3                1
4         12345      100                      4                1
5         12345      100                      5                1

**6         67891      101                      6                2**
7         67891      101                      7                2
8         67891      101                      8                2
9         67891      101                      9                2
10        67891      101                      10               2

**11        78910      102                      11               3**
12        78910      102                      12               3
13        78910      102                      13               3
14        78910      102                      14               3

Is there any logic to display the distinct results in the CTE temp table?

The final result should be populated as below.

VISIT_ID    First_Customer Second_Customer  Other_Customers

1            100            101             102,103,104,105,106
2            200            201             202,203,204,205

First Customer_ID should get displayed in the First_Customer column, Second_Customer_Id should get displayed in Second_Customer column.. All the other customer_ids should be displayed in the final column and it should be comma separated.

Also, I wanted the results to be ordered by visitpg_nbr



Solution 1:[1]

You should be able to get this with array_agg(), and then choosing the first, second, and subsequent (array_slice()) elements:

with data as (
    select *
    from snowflake_sample_data.tpch_sf100.orders
    where o_custkey between 5411266 and 5411290
)

select o_custkey, arr[0], arr[1], array_to_string(array_slice(arr, 2, 99999), ', ')
from (
    select o_custkey, array_agg(o_orderkey) within group(order by o_orderdate) arr
    from data
    group by 1 
);

You might need to get unique ids in case there are many, you can solve that with a subquery before array_agg().

enter image description here

Solution 2:[2]

slightly different to Felipe's answer, not sure which would be more performant. I suspect his, but anyways here is another way to try it.

SELECT visit_id, first_customer, second_customer 
    ,array_agg(other_ids) within group (order by order_id) as other_customer
FROM(
    SELECT visit_id,
        order_id,
        first_value(customer_id) over (partition by visit_id order by order_id) as first_customer,
        first_value(customer_id) over (partition by visit_id order by order_id) as second_customer,
        IFF(row_number() over (partition by visit_id order by order_id) > 2, customer_id, null) as other_ids
    FROM VALUES
        (1,100, 1),
        (1,101, 2),
        (1,102, 3),
        (1,103, 5),
        (1,104, 6),
        (1,105, 6),
        (1,106, 7),
        (2,200, 1),
        (2,201, 2),
        (2,202, 3),
        (2,203, 4)
        v(visit_id, customer_id, order_id)
    )
GROUP BY 1,2,3
ORDER BY 1,2,3;
VISIT_ID FIRST_CUSTOMER SECOND_CUSTOMER OTHER_CUSTOMER
1 100 100 [ 102, 103, 104, 105, 106 ]
2 200 200 [ 202, 203 ]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Felipe Hoffa
Solution 2 Simeon Pilgrim