'RANK() OVER Partition with resetting in dependence of another column

I have in a table 2 different IDs and the timestamp, which I would like to rank. But the peculiarity is that I want to rank the S_ID until there is an entry at O_ID. Once there is an entry at O_ID, I want the next rank at S_ID to start at 1.

Here is an example:

select 
    S_ID,
    timestamp,
    O_ID,
    rank() OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS RANK
from table
order by S_ID, timestamp;
S_ID Timestamp O_ID Rank
2e114e9f 2021-11-26 08:57:44.049 NULL 1
2e114e9f 2021-12-26 17:07:26.272 NULL 2
2e114e9f 2021-12-27 08:13:24.277 NULL 3
2e114e9f 2021-12-29 11:32:56.952 2287549 4
2e114e9f 2021-12-30 13:41:28.821 NULL 5
2e114e9f 2021-12-30 19:53:28.590 NULL 6
2e114e9f 2022-02-05 09:50:54.104 2333002 7
2e114e9f 2022-02-19 10:14:31.389 NULL 8

How can I now add another rank in dependence of an entry in the column O_ID? So the outcome should be:

S_ID Timestamp O_ID Rank S_ID Rank both
2e114e9f 2021-11-26 08:57:44.049 NULL 1 1
2e114e9f 2021-12-26 17:07:26.272 NULL 2 2
2e114e9f 2021-12-27 08:13:24.277 NULL 3 3
2e114e9f 2021-12-29 11:32:56.952 2287549 4 4
2e114e9f 2021-12-30 13:41:28.821 NULL 5 1
2e114e9f 2021-12-30 19:53:28.590 NULL 6 2
2e114e9f 2022-02-05 09:50:54.104 2333002 7 3
2e114e9f 2022-02-19 10:14:31.389 NULL 8 1

I am happy about any food for thought!!!!



Solution 1:[1]

Looks like the gaps and islands approach can be helpful here - use lag to split data into groups (based on current and previous equality with some null handling) and then use group value as partition for the rank() function.

-- sample data
WITH dataset (S_ID, Timestamp, O_ID) AS (
    VALUES ('2e114e9f', timestamp '2021-11-26 08:57:44.049',    NULL),
    ('2e114e9f',    timestamp '2021-12-26 17:07:26.272',    NULL),
    ('2e114e9f',    timestamp '2021-12-27 08:13:24.277',    NULL),
    ('2e114e9f',    timestamp '2021-12-29 11:32:56.952',    2287549),
    ('2e114e9f',    timestamp '2021-12-30 13:41:28.821',    NULL),
    ('2e114e9f',    timestamp '2021-12-30 19:53:28.590',    NULL),
    ('2e114e9f',    timestamp '2022-02-05 09:50:54.104',    2333002),
    ('2e114e9f',    timestamp '2022-02-19 10:14:31.389',    NULL)
) 

--query
select S_ID,
    Timestamp,
    O_ID,
    rank() OVER (PARTITION BY S_ID, grp ORDER BY timestamp asc) AS RANK
from(
        select *,
            sum(if(prev is not null and (O_ID is null or O_ID != prev), 1, 0)) 
                OVER (PARTITION BY S_ID ORDER BY timestamp asc) as grp
        from (
                select *,
                    lag(O_ID) OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS prev
                from dataset
            )
    )

Output:

S_ID Timestamp O_ID RANK
2e114e9f 2021-11-26 08:57:44.049 1
2e114e9f 2021-12-26 17:07:26.272 2
2e114e9f 2021-12-27 08:13:24.277 3
2e114e9f 2021-12-29 11:32:56.952 2287549 4
2e114e9f 2021-12-30 13:41:28.821 1
2e114e9f 2021-12-30 19:53:28.590 2
2e114e9f 2022-02-05 09:50:54.104 2333002 3
2e114e9f 2022-02-19 10:14:31.389 1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Guru Stron