'SQL - check if a date is within 14d in another column and return the most recent date
I have a data set that has the following columns: user_id, A_view_dt, A_conversion_dt B_view_dt.
I wanted to check If A_conversion_dt IS NOT NULL, then see if there's another record in B_view_dt that belongs to the same user_id happened 14d prior to A_conversion_dt, if so return the B_view_dt value. Otherwise return NULL under B_view_dt.
IF A_conversion_dt IS NULL, then I'd like to compare A_view_dt and B_view_dt. If a B_view_dt record exists and happens 14d before then keep the most recent max(B_view_dt).
IF B_view_dt IS NULL then I'd like B_view_dt to return NULL.
I have attached a dummy dataset & desired output. I've been trying using UNION ALL and LEFT JOINs but there's always records being missing from the WHERE clause. Anybody has a better solve? Does using partition in this case help?
Thank you!!!
Solution 1:[1]
I doubt there is an efficient way to run this without some preprocessing especially for large dataset.
Basically you need 2 virtual datasets.
- Unique
user_idand choice of conversion/view date. Let's call thisA_dt. In my result, I addedcutoffto help with some debugging and make query simpler. Call thisa - Unique
user_idandB_View_dt. Call thisb
Select all data from a then find max B_View_dt within range of cutoff and A_dt.
Using t-sql, I use CTE to help create virtual tables. See link to test solution - http://sqlfiddle.com/#!18/4db34/5
;with a (user_id, A_dt, cutoff) as
(
select
user_id,
isnull(A_Conversion_dt,
A_view_dt) A_dt,
dateadd(d,-14, isnull(A_Conversion_dt,A_view_dt)) cutoff
from
t
group by
user_id,
isnull(A_Conversion_dt,
A_view_dt)
),
b (user_id, B_dt) as
(
select
user_id,
B_View_dt B_dt
from
t
group by
user_id,
B_View_dt
)
select
a.*,
MaxB_dt
from
a
outer apply
(
select
max(B_dt) MaxB_dt
from
b
where
a.user_id = b.user_id
and B_dt between cutoff and A_dt
) ob
Sample Data
Result from query
Solution 2:[2]
select user_id,
min(A_view_dt) as A_view_dt, min(A_conversion_dt) as A_conversion_dt,
max(case when
datediff(day, B_view_dt, coalesce(A_conversion_dt, A_view_dt)) between 0 and 14
then B_view_dt end) as B_view_dt
from T
group by user_id
It appears safe to make the assumption that all rows (per user) have the same values in the first two date columns so I am running with that in this query.
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=764c2857cc69190fca3fbd08de3e544c
Or perhaps you want all the date combinations to appear separately?:
select user_id, A_view_dt, A_conversion_dt,
max(case when
datediff(day, B_view_dt, coalesce(A_conversion_dt, A_view_dt)) between 0 and 14
then B_view_dt end) as B_view_dt
from T
group by user_id, A_view_dt, A_conversion_dt
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |


