'How to check the overlapping time intervals from one type 2 SCD dimension

I have one problem identifying and fixing some records having overlapping time intervals, for one scd type 2 dimension. What I have is:

Bkey   Uid  startDate                       endDate
'John'  1   1990-01-01 (some time stamp)    2017-01-10 (some time stamp)
'John'  2   2016-11=03 (some time stamp)    2016-11-14 (some time stamp)
'John'  3   2016-11-14 (some time stamp)    2016-12-29 (some time stamp)
'John'  4   2016-12-29 (some time stamp)    2017-01-10 (some time stamp)
'John'  5   2017-01-10 (some time stamp)    2017-04-22 (some time stamp)
......

I want to find (first) which are all the Johns having overlapping time periods, for a table having lots and lots of Johns and then to figure out a way to correct those overlapping time periods. For the latest I know there are some function LAGG, LEAD, which can handle that, but it eludes me how to find those over lappings. Any hints? Regards,



Solution 1:[1]

I think the tricky part of your query is being able to articulate the logic for overlapping ranges. We can self join on the condition that a row on the left overlaps with any row on the right. All matching rows are those which overlap.

We can think of four possible overlap scenarios:

|---------|   |---------|    no overlap

|---------|
       |---------|           1st end and 2nd start overlap

       |---------|
 |---------|                 1st start and 2nd end overlap

 |---------|
    |---|                    2nd completely contained inside 1st
                             (could be 1st inside 2nd also)

SELECT DISTINCT
    t.Uid
FROM yourTable t1
INNER JOIN yourTable t2
    ON t1.startDate <= t2.endDate AND
       t2.startDate <= t1.endDate
WHERE
    t1.Bkey = 'John' AND t2.Bkey = 'John'

This will at least let you identify overlapping records. Updating and separating them in a meaningful way will probably end up being an ugly gaps and islands problem, perhaps meriting another question.

Solution 2:[2]

we can acheive this by doing a self join of emp table.

  • a.emp_id != b.emp_id ensures same row is not joined with itself.

  • remaining comparison clause checks if any row's start date or end date falls in other row's date range.

      create table emp(name varchar(20), emp_id numeric(10), start_date date, end_date date);
    
      insert into emp values('John', 1, '1990-01-01', '2017-01-10');
      insert into emp values( 'John', 2, '2016-11-03', '2016-11-14');
      insert into emp values( 'John', 3, '2016-11-14', '2016-12-29');
      insert into emp values( 'John', 4, '2016-12-29', '2017-01-10');
      insert into emp values( 'John', 5, '2017-01-11', '2017-04-22');
    
      commit;
    
      with A as (select * from EMP),
      B as (select * from EMP)
      select A.* from A,B where A.EMP_ID != B.EMP_ID
      and A.START_DATE < B.END_DATE and B.START_DATE < A.END_DATE
      and (A.START_DATE between B.START_DATE and B.END_DATE
      or A.END_DATE between B.START_DATE and B.END_DATE);
    

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2