'How to check the overlapping time intervals from one type 2 SCD dimension
I have one problem identifying and fixing some records having overlapping time intervals, for one scd type 2 dimension. What I have is:
Bkey Uid startDate endDate
'John' 1 1990-01-01 (some time stamp) 2017-01-10 (some time stamp)
'John' 2 2016-11=03 (some time stamp) 2016-11-14 (some time stamp)
'John' 3 2016-11-14 (some time stamp) 2016-12-29 (some time stamp)
'John' 4 2016-12-29 (some time stamp) 2017-01-10 (some time stamp)
'John' 5 2017-01-10 (some time stamp) 2017-04-22 (some time stamp)
......
I want to find (first) which are all the Johns having overlapping time periods, for a table having lots and lots of Johns and then to figure out a way to correct those overlapping time periods. For the latest I know there are some function LAGG, LEAD, which can handle that, but it eludes me how to find those over lappings. Any hints? Regards,
Solution 1:[1]
I think the tricky part of your query is being able to articulate the logic for overlapping ranges. We can self join on the condition that a row on the left overlaps with any row on the right. All matching rows are those which overlap.
We can think of four possible overlap scenarios:
|---------| |---------| no overlap
|---------|
|---------| 1st end and 2nd start overlap
|---------|
|---------| 1st start and 2nd end overlap
|---------|
|---| 2nd completely contained inside 1st
(could be 1st inside 2nd also)
SELECT DISTINCT
t.Uid
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.startDate <= t2.endDate AND
t2.startDate <= t1.endDate
WHERE
t1.Bkey = 'John' AND t2.Bkey = 'John'
This will at least let you identify overlapping records. Updating and separating them in a meaningful way will probably end up being an ugly gaps and islands problem, perhaps meriting another question.
Solution 2:[2]
we can acheive this by doing a self join of emp table.
a.emp_id != b.emp_id ensures same row is not joined with itself.
remaining comparison clause checks if any row's start date or end date falls in other row's date range.
create table emp(name varchar(20), emp_id numeric(10), start_date date, end_date date); insert into emp values('John', 1, '1990-01-01', '2017-01-10'); insert into emp values( 'John', 2, '2016-11-03', '2016-11-14'); insert into emp values( 'John', 3, '2016-11-14', '2016-12-29'); insert into emp values( 'John', 4, '2016-12-29', '2017-01-10'); insert into emp values( 'John', 5, '2017-01-11', '2017-04-22'); commit; with A as (select * from EMP), B as (select * from EMP) select A.* from A,B where A.EMP_ID != B.EMP_ID and A.START_DATE < B.END_DATE and B.START_DATE < A.END_DATE and (A.START_DATE between B.START_DATE and B.END_DATE or A.END_DATE between B.START_DATE and B.END_DATE);
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
