'SQL aggregation based on time and column change
Have a situation where I'm trying to calculate start, end time and time spent in different areas in a location.
I have the data with timestamp and the location of the person when captured by the system.
The normal case is when a location change and in that case, the end time should be the previous seen value. The exception is when a person is not seen for 5m or more, in which case the endTime shall be the last seen (see above the line 2 and 3 of the aggregation output required).
Original data
date, zone
8h10m, room1
8h12m, room1
8h15m, hall
8h16m, hall
8h25m, hall
8h29m, hall
8h30m, room2
8h34m, room2
8h38m, room2
8h42m, room2
Aggregation/Summary required in the following way (or similar):
startDate, endDate, time, zone
8h10m, 8h12m, 3m, room1
8h15m, 8h16m, 2m, hall <-- special case time >5m
8h25m, 8h29m, 5m, hall
8h30, 8h42m, 9n, room2
Can you give me an idea of how can I make such "aggregation/summary" in SQL? I'm using BigQuery, but I believe standard SQL should do the job.
Thanks,
Rui
Solution 1:[1]
Consider below approach
select
min(date) as startDate, max(date) as endDate,
time_diff(max(date), min(date), minute) + 1 as time, zone
from (
select *, countif(new_zone) over (partition by zone order by date) as zone_number
from (
select *,
ifnull(date - lag(date) over (partition by zone order by date) > make_interval(minute => 5)
or zone != lag(zone) over(order by date), true) as new_zone
from your_table
)
)
group by zone, zone_number
if applied to sample data in your question
with your_table as (
select time "8:10:00" as date, "room1" as zone union all
select "8:12:00", "room1" union all
select "8:15:00", "hall" union all
select "8:16:00", "hall" union all
select "8:25:00", "hall" union all
select "8:29:00", "hall" union all
select "8:30:00", "room2" union all
select "8:34:00", "room2" union all
select "8:38:00", "room2" union all
select "8:42:00", "room2"
)
output is
Solution 2:[2]
From Mikhail Berlyant solution, the concept with countif has been used to simplify this query. This answer is capable of identifing every move, also if the person re-eneters a room withhin 5 minutes. See addition data provided in the table.
There are several step needed:
- add lines with zones
---when 5 minutes no data is there: Setover_5:minutestrue when difference from previous line (lag) to current line is over 5 minutes.unnest([0,1]) as xto duplicate dataset andqualifyto include dataset in this case. - sort all of the following statements by the column
date, xin theover(order by date, x) - with
lagobtain the last room and the last date. Because of the unnest x, look two rows back. - compare the last room to the current one, if they differ, set the
zone_changeto true. countif(zone_change)from 1st to current date to obtain thezone_id. This curresponds to a single zone.- To this
zone_idcount the time thexwas 0; this is the case when 5 minutes no position was given. group by zone_idand calculate the min and max date- remove the
---zones by filtering
With tbl as
(
SELECT TIME "8:10:00" as date, "room1" as zone
UNION ALL SELECT TIME "8:12:00", "room1"
UNION ALL SELECT TIME "8:15:00", "hall"
UNION ALL SELECT TIME "8:16:00", "hall"
UNION ALL SELECT TIME "8:25:00", "hall"
UNION ALL SELECT TIME "8:29:00", "hall"
UNION ALL SELECT TIME "8:30:00", "room2"
UNION ALL SELECT TIME "8:34:00", "room2"
UNION ALL SELECT TIME "8:38:00", "room2"
UNION ALL SELECT TIME "8:42:00", "room2"
UNION ALL SELECT TIME "8:43:00", "hall"
UNION ALL SELECT TIME "8:44:00", "room2"
)
SELECT
zone_id,
zone,
min(date) as startDate,
max(date) as endDate,
time_diff(max(date),min(date),minute)+1 as time_minutes
FROM
(
SELECT *,
countif(x=0) over (ORDER BY date,x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)+
countif(zone_change) over (ORDER BY date,x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as zone_id
FROM
(
SELECT date,x,if(x=1,zone,"---") as zone,
time_diff(date,lag(date,2) over (order by date),minute)>5 as over_5_minutes,
zone!=lag(zone,2) over (order by date,x) as zone_change
FROM tbl, unnest([0,1]) as x
Qualify over_5_minutes or x=1
)
)
where zone!="---"
group by 1,2
order by 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |

