'SQL to find the first occurrence of sets of data in a table

Say if I have a table:

CREATE TABLE T
(
    TableDTM  TIMESTAMP  NOT NULL,
    Code      INT        NOT NULL
);

And I insert some rows:

INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:00:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:10:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:20:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:30:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:40:00', 0);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 10:50:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:00:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:10:00', 1);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:20:00', 0);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:30:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:40:00', 5);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 11:50:00', 3);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:00:00', 3);
INSERT INTO T (TableDTM, Code) VALUES ('2011-01-13 12:10:00', 3);

So I end up with a table similar to:

2011-01-13 10:00:00, 5
2011-01-13 10:10:00, 5
2011-01-13 10:20:00, 5
2011-01-13 10:30:00, 5
2011-01-13 10:40:00, 0
2011-01-13 10:50:00, 1
2011-01-13 11:00:00, 1
2011-01-13 11:10:00, 1
2011-01-13 11:20:00, 0
2011-01-13 11:30:00, 5
2011-01-13 11:40:00, 5
2011-01-13 11:50:00, 3
2011-01-13 12:00:00, 3
2011-01-13 12:10:00, 3

How can I select the first date of each set of identical numbers, so I end up with this:

2011-01-13 10:00:00, 5
2011-01-13 10:40:00, 0
2011-01-13 10:50:00, 1
2011-01-13 11:20:00, 0
2011-01-13 11:30:00, 5
2011-01-13 11:50:00, 3

I've been messing about with sub queries and the like for most of the day and for some reason I can't seem to crack it. I'm sure there's a simple way somewhere!

I would probably want to exclude the 0's from the results, but that's not important for now..



Solution 1:[1]

Try this:

SELECT MIN(TableDTM) TableDTM, Code
FROM
(
    SELECT T1.TableDTM, T1.Code, MIN(T2.TableDTM) XTableDTM
    FROM T T1
    LEFT JOIN T T2
    ON T1.TableDTM <= T2.TableDTM
    AND T1.Code <> T2.Code
    GROUP BY T1.TableDTM, T1.Code
) X
GROUP BY XTableDTM, Code
ORDER BY 1;

Solution 2:[2]

PostgreSQL supports window functions, have a look at this

[EDIT] Try the following:

SELECT TableDTM, Code FROM
(
    SELECT TableDTM,
           Code,
           LAG(Code, 1, NULL) OVER (ORDER BY TableDTM) AS PrevCode
    FROM   T
)
WHERE PrevCode<>Code OR PrevCode IS NULL;

Solution 3:[3]

could you try something like

"SELECT DISTINCT Code, (SELECT MIN(TableDTM) FROM T AS Q WHERE Q.Code = T.Code) As TableDTM FROM T;"

and if you need to exclude the 0, change it in:

 SELECT DISTINCT Code, (SELECT MIN(TableDTM) FROM T AS Q WHERE Q.Code = T.Code) As TableDTM FROM T WHERE Code <> 0;

Solution 4:[4]

Maybe I don't understand the question. But I don't see any mention of Common Table Expression or Analytic Functions. These are my weapons of choice for most problems, and when they can't handle it I start resorting to temporary tables.

I think, I recently solve a similar problem where I want to get the data of the first occurrence of an error when processing a daily interface file. Records on the interface that cause a problem are removed to a set of holding table so the rest of the records can be processed.

-- EE with errors removed from most recent batch
with current_batch as (
      select employee_number, PVL.ADDITIONAL_INFORMATION
      from PERSONNEL_VALIDATION_LOG_X PVL
      where PVL.PERSONNEL_BATCH_ID = EMPSRV.CURRENTPERSONNELBATCH(6,900)
)
, hist as (
  select 
    row_number() over (
      partition by X.EMPLOYEE_NUMBER, X.ADDITIONAL_INFORMATION
      order by B.BATCH_STATUS_DATE
    ) as RN,
    B.PERSONNEL_BATCH_ID BatchId,
    B.SUBMITTAL_DATE,
    X.EMPLOYEE_NUMBER EMPNUM,
    MX.LAST_NAME,
    MX.FIRST_NAME,
    X.ADDITIONAL_INFORMATION
  from PERSONNEL_VALIDATION_LOG_X X
  join current_batch C on
    X.Employee_number = C.EMPLOYEE_NUMBER
    and X.additional_information = C.ADDITIONAL_INFORMATION
  join empsrv.personnel_batch B 
    on B.PERSONNEL_BATCH_ID = X.PERSONNEL_BATCH_ID
  join EMPSRV.PERSONNEL_MEMBER_DATA_X MX
    on X.PERSONNEL_BATCH_ID = MX.PERSONNEL_BATCH_ID
      and X.EMPLOYEE_NUMBER = MX.EMPLOYEE_NUMBER
)
select 
  batchId, 
  to_char(submittal_date, 'mm/dd/yyyy') First_Reported,
  EmpNum, 
  Last_name, 
  first_name, 
  additional_information
from hist where rn = 1
order by submittal_date desc;

The first CTE just limits the population to current errors. The hist CTE goes through the logs and picks up the first occurrence of that error (ie. ame EE and messge) This isn't perfect because maybe the error went away and came back, I would get the oldest occurrence and not the start of the most recent sequence. But this is good enough and not likely due to the shape of the error message itself. The finally query just picks off the top row of each group which will be the first occurrence.

The query takes a few seconds to run, but my logs are not especially large, so performance is almost never an issue for me ever. I also don't pay much attent to the dates on the questions.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 nvogel
Solution 2
Solution 3 Ass3mbler
Solution 4 Darrel Lee