'How to create an ID to identify groups within table - PostgreSQL
I have a dataset that looks like this:
| key_field | report_year | quarter | report_month | old_status | new_status | old_date | new_date | old_hist_id | new_hist_id | date_range |
|---|---|---|---|---|---|---|---|---|---|---|
| Unq_010203 | 2021 | 2 | 4 | REJECTED | REJECTED | 4/1/21 20:17 | 4/5/21 22:06 | 1 | 2 | 4 |
| Unq_010203 | 2021 | 2 | 4 | REJECTED | APPROVED | 4/5/21 22:06 | 4/6/21 16:42 | 2 | 3 | 0 |
| Unq_010203 | 2021 | 2 | 4 | APPROVED | 4/6/21 16:42 | 3 | ||||
| Unq_010203 | 2021 | 2 | 5 | REJECTED | REJECTED | 5/30/21 18:55 | 5/31/21 9:36 | 4 | 5 | 0 |
| Unq_010203 | 2021 | 2 | 5 | REJECTED | APPROVED | 5/31/21 9:36 | 4/14/22 12:00 | 5 | 6 | 317 |
| Unq_010203 | 2022 | 2 | 4 | APPROVED | 4/14/22 12:00 | 6 |
This dataset snippet relates to a single key_field, which is at the level I need the grouping happening.
I need to create a campaign identifier within that dataset using the following criteria. If a key_field gets approved, a new campaign will start. If the date_range of a key_field has gone over 90 days, a new campaign will start. Resulting in a table that looks like this:
| key_field | report_year | quarter | report_month | old_status | new_status | old_date | new_date | old_hist_id | new_hist_id | date_range | CAMPAIGN_ID |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Unq_010203 | 2021 | 2 | 4 | REJECTED | REJECTED | 4/1/21 20:17 | 4/5/21 22:06 | 1 | 2 | 4 | 1 |
| Unq_010203 | 2021 | 2 | 4 | REJECTED | APPROVED | 4/5/21 22:06 | 4/6/21 16:42 | 2 | 3 | 0 | 1 |
| Unq_010203 | 2021 | 2 | 4 | APPROVED | 4/6/21 16:42 | 3 | 1 | ||||
| Unq_010203 | 2021 | 2 | 5 | REJECTED | REJECTED | 5/30/21 18:55 | 5/31/21 9:36 | 4 | 5 | 0 | 2 |
| Unq_010203 | 2021 | 2 | 5 | REJECTED | APPROVED | 5/31/21 9:36 | 4/14/22 12:00 | 5 | 6 | 317 | 2 |
| Unq_010203 | 2022 | 2 | 4 | APPROVED | 4/14/22 12:00 | 6 | 3 |
I have tried using RANK() OVER() but I can't seem to get the partitions right.
Table CREATE statement:
CREATE TABLE omg_fml
(key_field VARCHAR(10),
report_year INT(4),
quarter INT(1),
report_month INT(1),
old_status VARCHAR(8),
new_status VARCHAR(8),
old_date DATE,
new_date DATE,
old_hist_id INT(2),
new_hist_id INT(2),
date_range INT(3));
Table INSERT statement:
INSERT
INTO omg_fml
VALUES ( 'Unq_010203', 2021, 2, 4, 'REJECTED', 'REJECTED', '2021-04-04', '2021-04-05', 1, 2, 4 )
, ( 'Unq_010203', 2021, 2, 4, 'REJECTED', 'APPROVED', '2021-04-05', '2021-04-06', 2, 3, 0 )
, ( 'Unq_010203', 2021, 2, 4, 'APPROVED', NULL, '2021-04-06', NULL, 3, NULL, NULL )
, ( 'Unq_010203', 2021, 2, 5, 'REJECTED', 'REJECTED', '2021-05-30', '2021-05-31', 4, 5, 0 )
, ( 'Unq_010203', 2021, 2, 5, 'REJECTED', 'APPROVED', '2021-05-31', '2022-04-14', 5, 6, 317 )
, ( 'Unq_010203', 2022, 2, 4, 'APPROVED', NULL, '2022-04-14', NULL, 6, NULL, NULL );
PostgreSQL version 8.0 (done in Redshift)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
