'SQL to find the first occurrence of sets of data in a table
Say if I have a table:
CREATE TABLE T
(
TableDTM TIMESTAMP NOT NULL,
Code INT NOT NULL
);
And I insert some rows:
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:00:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:10:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:20:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:30:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:40:00', 0);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:50:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:00:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:10:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:20:00', 0);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:30:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:40:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:50:00', 3);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:00:00', 3);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:10:00', 3);
So I end up with a table similar to:
2011-01-13 10:00:00, 5
2011-01-13 10:10:00, 5
2011-01-13 10:20:00, 5
2011-01-13 10:30:00, 5
2011-01-13 10:40:00, 0
2011-01-13 10:50:00, 1
2011-01-13 11:00:00, 1
2011-01-13 11:10:00, 1
2011-01-13 11:20:00, 0
2011-01-13 11:30:00, 5
2011-01-13 11:40:00, 5
2011-01-13 11:50:00, 3
2011-01-13 12:00:00, 3
2011-01-13 12:10:00, 3
How can I select the first date of each set of identical numbers, so I end up with this:
2011-01-13 10:00:00, 5
2011-01-13 10:40:00, 0
2011-01-13 10:50:00, 1
2011-01-13 11:20:00, 0
2011-01-13 11:30:00, 5
2011-01-13 11:50:00, 3
I've been messing about with sub queries and the like for most of the day and for some reason I can't seem to crack it. I'm sure there's a simple way somewhere!
I would probably want to exclude the 0's from the results, but that's not important for now..
Solution 1:[1]
Try this:
SELECT MIN(TableDTM) TableDTM, Code
FROM
(
SELECT T1.TableDTM, T1.Code, MIN(T2.TableDTM) XTableDTM
FROM T T1
LEFT JOIN T T2
ON T1.TableDTM <= T2.TableDTM
AND T1.Code <> T2.Code
GROUP BY T1.TableDTM, T1.Code
) X
GROUP BY XTableDTM, Code
ORDER BY 1;
Solution 2:[2]
PostgreSQL supports window functions, have a look at this
[EDIT] Try the following:
SELECT TableDTM, Code FROM
(
SELECT TableDTM,
Code,
LAG(Code, 1, NULL) OVER (ORDER BY TableDTM) AS PrevCode
FROM T
)
WHERE PrevCode<>Code OR PrevCode IS NULL;
Solution 3:[3]
could you try something like
"SELECT DISTINCT Code, (SELECT MIN(TableDTM) FROM T AS Q WHERE Q.Code = T.Code) As TableDTM FROM T;"
and if you need to exclude the 0, change it in:
SELECT DISTINCT Code, (SELECT MIN(TableDTM) FROM T AS Q WHERE Q.Code = T.Code) As TableDTM FROM T WHERE Code <> 0;
Solution 4:[4]
Maybe I don't understand the question. But I don't see any mention of Common Table Expression or Analytic Functions. These are my weapons of choice for most problems, and when they can't handle it I start resorting to temporary tables.
I think, I recently solve a similar problem where I want to get the data of the first occurrence of an error when processing a daily interface file. Records on the interface that cause a problem are removed to a set of holding table so the rest of the records can be processed.
-- EE with errors removed from most recent batch
with current_batch as (
select employee_number, PVL.ADDITIONAL_INFORMATION
from PERSONNEL_VALIDATION_LOG_X PVL
where PVL.PERSONNEL_BATCH_ID = EMPSRV.CURRENTPERSONNELBATCH(6,900)
)
, hist as (
select
row_number() over (
partition by X.EMPLOYEE_NUMBER, X.ADDITIONAL_INFORMATION
order by B.BATCH_STATUS_DATE
) as RN,
B.PERSONNEL_BATCH_ID BatchId,
B.SUBMITTAL_DATE,
X.EMPLOYEE_NUMBER EMPNUM,
MX.LAST_NAME,
MX.FIRST_NAME,
X.ADDITIONAL_INFORMATION
from PERSONNEL_VALIDATION_LOG_X X
join current_batch C on
X.Employee_number = C.EMPLOYEE_NUMBER
and X.additional_information = C.ADDITIONAL_INFORMATION
join empsrv.personnel_batch B
on B.PERSONNEL_BATCH_ID = X.PERSONNEL_BATCH_ID
join EMPSRV.PERSONNEL_MEMBER_DATA_X MX
on X.PERSONNEL_BATCH_ID = MX.PERSONNEL_BATCH_ID
and X.EMPLOYEE_NUMBER = MX.EMPLOYEE_NUMBER
)
select
batchId,
to_char(submittal_date, 'mm/dd/yyyy') First_Reported,
EmpNum,
Last_name,
first_name,
additional_information
from hist where rn = 1
order by submittal_date desc;
The first CTE just limits the population to current errors. The hist CTE goes through the logs and picks up the first occurrence of that error (ie. ame EE and messge) This isn't perfect because maybe the error went away and came back, I would get the oldest occurrence and not the start of the most recent sequence. But this is good enough and not likely due to the shape of the error message itself. The finally query just picks off the top row of each group which will be the first occurrence.
The query takes a few seconds to run, but my logs are not especially large, so performance is almost never an issue for me ever. I also don't pay much attent to the dates on the questions.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | nvogel |
| Solution 2 | |
| Solution 3 | Ass3mbler |
| Solution 4 | Darrel Lee |
