'RANK() OVER Partition with resetting in dependence of another column
I have in a table 2 different IDs and the timestamp, which I would like to rank. But the peculiarity is that I want to rank the S_ID until there is an entry at O_ID. Once there is an entry at O_ID, I want the next rank at S_ID to start at 1.
Here is an example:
select
S_ID,
timestamp,
O_ID,
rank() OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS RANK
from table
order by S_ID, timestamp;
| S_ID | Timestamp | O_ID | Rank |
|---|---|---|---|
| 2e114e9f | 2021-11-26 08:57:44.049 | NULL | 1 |
| 2e114e9f | 2021-12-26 17:07:26.272 | NULL | 2 |
| 2e114e9f | 2021-12-27 08:13:24.277 | NULL | 3 |
| 2e114e9f | 2021-12-29 11:32:56.952 | 2287549 | 4 |
| 2e114e9f | 2021-12-30 13:41:28.821 | NULL | 5 |
| 2e114e9f | 2021-12-30 19:53:28.590 | NULL | 6 |
| 2e114e9f | 2022-02-05 09:50:54.104 | 2333002 | 7 |
| 2e114e9f | 2022-02-19 10:14:31.389 | NULL | 8 |
How can I now add another rank in dependence of an entry in the column O_ID? So the outcome should be:
| S_ID | Timestamp | O_ID | Rank S_ID | Rank both |
|---|---|---|---|---|
| 2e114e9f | 2021-11-26 08:57:44.049 | NULL | 1 | 1 |
| 2e114e9f | 2021-12-26 17:07:26.272 | NULL | 2 | 2 |
| 2e114e9f | 2021-12-27 08:13:24.277 | NULL | 3 | 3 |
| 2e114e9f | 2021-12-29 11:32:56.952 | 2287549 | 4 | 4 |
| 2e114e9f | 2021-12-30 13:41:28.821 | NULL | 5 | 1 |
| 2e114e9f | 2021-12-30 19:53:28.590 | NULL | 6 | 2 |
| 2e114e9f | 2022-02-05 09:50:54.104 | 2333002 | 7 | 3 |
| 2e114e9f | 2022-02-19 10:14:31.389 | NULL | 8 | 1 |
I am happy about any food for thought!!!!
Solution 1:[1]
Looks like the gaps and islands approach can be helpful here - use lag to split data into groups (based on current and previous equality with some null handling) and then use group value as partition for the rank() function.
-- sample data
WITH dataset (S_ID, Timestamp, O_ID) AS (
VALUES ('2e114e9f', timestamp '2021-11-26 08:57:44.049', NULL),
('2e114e9f', timestamp '2021-12-26 17:07:26.272', NULL),
('2e114e9f', timestamp '2021-12-27 08:13:24.277', NULL),
('2e114e9f', timestamp '2021-12-29 11:32:56.952', 2287549),
('2e114e9f', timestamp '2021-12-30 13:41:28.821', NULL),
('2e114e9f', timestamp '2021-12-30 19:53:28.590', NULL),
('2e114e9f', timestamp '2022-02-05 09:50:54.104', 2333002),
('2e114e9f', timestamp '2022-02-19 10:14:31.389', NULL)
)
--query
select S_ID,
Timestamp,
O_ID,
rank() OVER (PARTITION BY S_ID, grp ORDER BY timestamp asc) AS RANK
from(
select *,
sum(if(prev is not null and (O_ID is null or O_ID != prev), 1, 0))
OVER (PARTITION BY S_ID ORDER BY timestamp asc) as grp
from (
select *,
lag(O_ID) OVER (PARTITION BY S_ID ORDER BY timestamp asc) AS prev
from dataset
)
)
Output:
| S_ID | Timestamp | O_ID | RANK |
|---|---|---|---|
| 2e114e9f | 2021-11-26 08:57:44.049 | 1 | |
| 2e114e9f | 2021-12-26 17:07:26.272 | 2 | |
| 2e114e9f | 2021-12-27 08:13:24.277 | 3 | |
| 2e114e9f | 2021-12-29 11:32:56.952 | 2287549 | 4 |
| 2e114e9f | 2021-12-30 13:41:28.821 | 1 | |
| 2e114e9f | 2021-12-30 19:53:28.590 | 2 | |
| 2e114e9f | 2022-02-05 09:50:54.104 | 2333002 | 3 |
| 2e114e9f | 2022-02-19 10:14:31.389 | 1 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Guru Stron |
