'Reduce table records based on minimum time difference

I have a log table (MS SQL SERVER) with event entries (events are user actions like "user logged in", "user viewed entity A" etc).

Some events like "user viewed entity A" may occur multiple times within a short time frame. For instance if a user goes back and forward in his browser he may enter entity A's page multiple times within a minute, and multiple "user view" events will be logged.

For my analytics dashboard I would like to count how many times a user viewed entity A, but I would like to "debounce" the result. I want to consider multiple "user view" events close to one another as one "user view" event. Specifically, I want to consider a new "user view" event only if it is more than 30 minutes from the last one.

So having a table like this (last column is my comments for clarity):

timestamp evt_type user_id entity_id *time diff from previous event
15:30 ENTITY_VIEW U1 E1 NULL (first view)
15:38 ENTITY_VIEW U1 E1 8mins
16:05 ENTITY_VIEW U1 E1 28mins
16:50 ENTITY_VIEW U1 E1 45mins (this counts as new view)
17:15 ENTITY_VIEW U1 E1 25mins
17:44 ENTITY_VIEW U1 E1 29mins
18:30 ENTITY_VIEW U1 E1 46mins (this counts as another view)

I would like to determine that the user "viewed" the entity 3 times.

What would be a query to determine this? I tried LEAD, LAG, PARTITION BY and other comnbinations but I don't seem to find the correct way as I am not an SQL expert.



Solution 1:[1]

Should be a simple LAG() to grab the previous timestamp and check the diff. Will say your column [timestamp] is an odd data type, what about different days? Is there a separate column for date?

Return Records >30 Minutes from Previous Record

WITH cte_DeltaSinceLastView AS (
    SELECT *
        /*Grab previous record for each user_id/entity_id combo*/
        ,PrevTimestamp = LAG([timestamp]) OVER (PARTITION BY [user_id],[entity_id] ORDER BY [timestamp])
    FROM YourTable
    ) AS A(ID,[user_id],[entity_id],[timestamp])
)
SELECT *,MinutesSinceLastView = DATEDIFF(minute,PrevTimestamp,[Timestamp])
FROM cte_DeltaSinceLastView
WHERE DATEDIFF(minute,PrevTimestamp,[timestamp]) > 30 /*Over 30 minutes between last view*/
OR PrevTimestamp IS NULL /*First view will not have previous timestamp to compare against*/

Solution 2:[2]

Something you could try is a correlated subquery that disregards any rows that are within 30 minutes of previous rows, the remaining rows should be the ones that qualify (ie a gap of 30+ minutes exists). See if this works for you?

select Sum(vc) as ViewedCount
from (
    select case when exists (
      select * from t t2 
        where t2.timestamp > t.timestamp 
          and t2.evt_type = t.evt_type 
          and t2.user_id = t.user_id 
          and t2.entity_id = t.entity_id
          and DateDiff(minute, t.timestamp,t2.timestamp) <30 
      ) then 0 else 1 end vc
    from t
)b;

This assumes that Timestamp is a time data type. This won't work across day boundaries but the same concept should apply with a datetime type.

Demo as Fiddle

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Stephan
Solution 2