'postgresql: define query to get all first occurring events pr group
I have the following data structure:
CREATE TABLE test_table(
id int8 PRIMARY KEY,
patientid VARCHAR(255) NOT NULL,
studyid VARCHAR(255) NOT NULL,
gid VARCHAR(255) NOT NULL,
labels int8,
timestamp TIMESTAMP
);
With the following data:
INSERT INTO test_table(id, patientid, studyid, gid, labels, timestamp)
VALUES (1, 1, 1, 1, 0, '2015-01-11 00:51:14'),
(4, 1, 1, 1, 1, '2015-01-11 00:54:14'),
(3, 1, 1, 1, 3, '2015-01-11 00:53:14'),
(2, 1, 1, 1, 0, '2015-01-11 00:52:14'),
(5, 1, 1, 1, 0, '2015-01-11 00:55:14'),
(6, 1, 2, 2, 1, '2015-01-12 08:10:14'),
(7, 1, 2, 2, 0, '2015-01-12 08:11:14'),
(8, 1, 2, 2, 1, '2015-01-12 08:13:25'),
(9, 2, 1, 3, 0, '2015-01-12 09:14:25'),
(10, 2, 1, 3, 0, '2015-01-12 09:15:25'),
(11, 2, 1, 3, 2, '2015-01-12 09:17:25'),
(12, 2, 1, 3, 6, '2015-01-12 09:18:25'),
(13, 3, 1, 4, 0, '2015-01-13 07:14:25'),
(14, 3, 1, 4, 0, '2015-01-13 07:15:25'),
(15, 3, 1, 4, 1, '2015-01-13 07:17:25'),
(16, 3, 1, 4, 0, '2015-01-13 07:18:25'),
(17, 3, 1, 4, 0, '2015-01-13 07:18:25'),
(18, 4, 1, 5, 0, '2015-01-13 03:14:25'),
(19, 4, 1, 5, 0, '2015-01-13 03:15:25'),
(20, 4, 1, 5, 0, '2015-01-13 03:17:25'),
(21, 4, 1, 5, 1, '2015-01-13 03:18:25'),
(22, 4, 2, 6, 0, '2015-01-13 03:12:13'),
(23, 4, 2, 6, 1, '2015-01-13 03:14:07'),
(24, 4, 2, 6, 1, '2015-01-13 03:17:05'),
(25, 4, 2, 6, 0, '2015-01-13 03:19:42'),
(26, 4, 2, 6, 0, '2015-01-13 03:22:51')
;
I seek to retrieve the first occurring rows with labels=0 grouped by the combination of patientid and studyid.
The desired output contains the the following row ids: (1, 2, 7, 9, 10, 13, 14, 18, 19, 20, 22)
I have tried many things, but nothing seems viable. As an example I have tried the following (special thanks for help in this thread):
select rr.* from (
select tt.*,
-- rank by group and sort
rank() over (partition by patientid, studyid order by timestamp) rank
from test_table tt where labels = 0) rr
where rr.rank <= 20;
Unfortunately, the query above selects all 16 rows with labels=0. The number 20 is just an upper bound for how many occurrences of the labels=0 I suspect is possible.
I hope someone has the solution and an explanation on how to get the desired output.
Solution 1:[1]
select rr.* from (
select tt.*,
-- rank by group and sort
rank() over (partition by patientid, studyid order by timestamp) rank
from test_table tt where labels = 0) rr
-- you wanted only the first row in your desc; rank=1
-- your test output is rank <=2
where rr.rank <= 2;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | PrasadU |
