'How to return the highest value in a sequence for each ID unless one has been skipped, with SQL?
I want to return the columns listed below but would like it grouped by JobNumber and only return the maximum StageNum so long as it has not skipped any stages.
For example, my table currently looks like this:
| ID | Stage | StageNum | Date |
|---|---|---|---|
| 104 | Released | 10 | 2022-02-07 |
| 104 | Slab | 20 | 2022-02-18 |
| 104 | Frame | 30 | 2022-03-07 |
| 104 | Mechanicals | 42 | 2022-03-10 |
| 105 | Released | 10 | 2022-02-07 |
| 105 | Slab | 20 | 2022-02-18 |
| 106 | Released | 10 | 2022-02-07 |
| 106 | Slab | 20 | 2022-02-18 |
| 106 | Frame | 30 | 2022-03-04 |
| 106 | Cornice | 40 | 2022-03-08 |
And the result should look like this with ID 104 returning StageNum 30 instead of 42 since StageNum 40 was skipped:
| ID | Stage | StageNum | Date |
|---|---|---|---|
| 104 | Frame | 30 | 2022-03-07 |
| 105 | Slab | 20 | 2022-02-18 |
| 106 | Cornice | 40 | 2022-03-08 |
I can return the highest value by grouping by ID and selecting MAX(StageNum) but do not know how to account for the skipped stages.
Solution 1:[1]
Provided there exists the stages table
select * into stages
from(
values
('Released', 10),
('Slab', 20),
('Frame', 30),
('Mechanicals', 42),
('Cornice', 40)
) t(Stage, StageNum)
You can compare positions requierd with the operations
select top(1) with ties ID, Stage, StageNum, Date
from (
select t.ID ,t.StageNum, t.Date, stn.Stage, stn.spos, row_number() over(partition by ID order by t.StageNum) tpos
from tbl t
join (
select StageNum, Stage, row_number() over(order by StageNum) spos
from stages
) stn on stn.StageNum = t.StageNum
) t
where tpos=spos
order by row_number() over(partition by ID order by StageNum desc)
Solution 2:[2]
Here is one possibility without creating a table to records stages. We use string_agg() and then check that all the stages are present. We could do this without a CTE but we would have to run string_agg several times per row.
create table c( ID int, Stage varchar(20), StageNum int, sDate Date); insert into c values (104,'Released',10,'2022-02-07'), (104,'Slab', 20,'2022-02-18'), (104,'Frame', 30,'2022-03-07'), (104,'Mechanicals',42,'2022-03-10'), (105,'Released', 10,'2022-02-07'), (105,'Slab', 20,'2022-02-18'), (106,'Released',10,'2022-02-07'), (106,'Slab', 20,'2022-02-18'), (106,'Frame', 30,'2022-03-04'), (106,'Cornice', 40,'2022-03-08');
select * from c;ID | Stage | StageNum | sDate --: | :---------- | -------: | :--------- 104 | Released | 10 | 2022-02-07 104 | Slab | 20 | 2022-02-18 104 | Frame | 30 | 2022-03-07 104 | Mechanicals | 42 | 2022-03-10 105 | Released | 10 | 2022-02-07 105 | Slab | 20 | 2022-02-18 106 | Released | 10 | 2022-02-07 106 | Slab | 20 | 2022-02-18 106 | Frame | 30 | 2022-03-04 106 | Cornice | 40 | 2022-03-08
select id, string_agg(stageNUM,'-') from c group by id;
id | (No column name) --: | :--------------- 104 | 10-20-30-42 105 | 10-20 106 | 10-20-30-40
with stages as( select id, string_agg(stageNUM,'') s from c group by id ) select id, case when s = '1020304042' then 'Mechanicals' when left(s,8) = '10203040' then 'Cornice' when left(s,6) = '102030' then 'Frame' when left(s,4) = '1020' then 'Slab' when left(s,2) = '10' then 'Released' else 'unknown' end Stage from stages
id | Stage --: | :------ 104 | Frame 105 | Slab 106 | Cornice
db<>fiddle here
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
