'RANK() OVER Partition with resetting in dependence of another column

I have in a table 2 different IDs and the timestamp, which I would like to rank. But the peculiarity is that I want to rank the S_ID until there is an entry at O_ID. Once there is an entry at O_ID, I want the next rank at S_ID to start at 1.

Here is an example:

select 
    S_ID,
    timestamp,
    O_ID,
    rank() OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS RANK
from table
order by S_ID, timestamp;

S_ID	Timestamp	O_ID	Rank
2e114e9f	2021-11-26 08:57:44.049	NULL	1
2e114e9f	2021-12-26 17:07:26.272	NULL	2
2e114e9f	2021-12-27 08:13:24.277	NULL	3
2e114e9f	2021-12-29 11:32:56.952	2287549	4
2e114e9f	2021-12-30 13:41:28.821	NULL	5
2e114e9f	2021-12-30 19:53:28.590	NULL	6
2e114e9f	2022-02-05 09:50:54.104	2333002	7
2e114e9f	2022-02-19 10:14:31.389	NULL	8

How can I now add another rank in dependence of an entry in the column O_ID? So the outcome should be:

S_ID	Timestamp	O_ID	Rank S_ID	Rank both
2e114e9f	2021-11-26 08:57:44.049	NULL	1	1
2e114e9f	2021-12-26 17:07:26.272	NULL	2	2
2e114e9f	2021-12-27 08:13:24.277	NULL	3	3
2e114e9f	2021-12-29 11:32:56.952	2287549	4	4
2e114e9f	2021-12-30 13:41:28.821	NULL	5	1
2e114e9f	2021-12-30 19:53:28.590	NULL	6	2
2e114e9f	2022-02-05 09:50:54.104	2333002	7	3
2e114e9f	2022-02-19 10:14:31.389	NULL	8	1

I am happy about any food for thought!!!!

presto

Solution 1:^[1]

Looks like the gaps and islands approach can be helpful here - use lag to split data into groups (based on current and previous equality with some null handling) and then use group value as partition for the rank() function.

-- sample data
WITH dataset (S_ID, Timestamp, O_ID) AS (
    VALUES ('2e114e9f', timestamp '2021-11-26 08:57:44.049',    NULL),
    ('2e114e9f',    timestamp '2021-12-26 17:07:26.272',    NULL),
    ('2e114e9f',    timestamp '2021-12-27 08:13:24.277',    NULL),
    ('2e114e9f',    timestamp '2021-12-29 11:32:56.952',    2287549),
    ('2e114e9f',    timestamp '2021-12-30 13:41:28.821',    NULL),
    ('2e114e9f',    timestamp '2021-12-30 19:53:28.590',    NULL),
    ('2e114e9f',    timestamp '2022-02-05 09:50:54.104',    2333002),
    ('2e114e9f',    timestamp '2022-02-19 10:14:31.389',    NULL)
) 

--query
select S_ID,
    Timestamp,
    O_ID,
    rank() OVER (PARTITION BY S_ID, grp ORDER BY timestamp asc) AS RANK
from(
        select *,
            sum(if(prev is not null and (O_ID is null or O_ID != prev), 1, 0)) 
                OVER (PARTITION BY S_ID ORDER BY timestamp asc) as grp
        from (
                select *,
                    lag(O_ID) OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS prev
                from dataset
            )
    )

Output:

S_ID	Timestamp	O_ID	RANK
2e114e9f	2021-11-26 08:57:44.049		1
2e114e9f	2021-12-26 17:07:26.272		2
2e114e9f	2021-12-27 08:13:24.277		3
2e114e9f	2021-12-29 11:32:56.952	2287549	4
2e114e9f	2021-12-30 13:41:28.821		1
2e114e9f	2021-12-30 19:53:28.590		2
2e114e9f	2022-02-05 09:50:54.104	2333002	3
2e114e9f	2022-02-19 10:14:31.389		1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Guru Stron

'RANK() OVER Partition with resetting in dependence of another column

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]