'Count distinct value and group by another value in the last 24 hours SQL Redshift

I would like to do a count of B in the last 24 hours for grouped by date and A.

sample table:

|            DATE           |      A      |         B          |
----------------------------------------------------------------
|    2021-12-01 00:00:00    |    John     |      device 1      |
|    2021-12-01 01:00:00    |    Maria    |      device 1      |
|    2021-12-01 01:00:00    |    John     |      device 2      |
|    2021-12-01 02:00:00    |    John     |      device 3      |
|    2021-12-01 02:00:00    |    Maria    |      device 2      |
|    2021-12-03 05:00:00    |    John     |      device 4      |
|    2021-12-03 09:00:00    |    John     |      device 5      |

expected output:

|            DATE           |      A      |  devices_last_24h  |
----------------------------------------------------------------
|    2021-12-01 00:00:00    |    John     |          1         |
|    2021-12-01 01:00:00    |    Maria    |          1         |
|    2021-12-01 01:00:00    |    John     |          2         |
|    2021-12-01 02:00:00    |    John     |          3         |
|    2021-12-01 02:00:00    |    Maria    |          1         |
|    2021-12-03 05:00:00    |    John     |          1         |
|    2021-12-03 09:00:00    |    John     |          2         |

im using Redshift database.

Can someone help me please?

Thank you!



Solution 1:[1]

So this is a bit trick especially to do it so it will run fast. My first approach would be to self join the table with an inequality condition which works (see below). However, this creates a loop-join which can be slow if the table in question is very very large. Otherwise this should work fine.

Set up:

create table test (dt timestamp, a varchar(16), b varchar(16));

insert into test values
('2021-12-01 00:00:00','John','device 1'),
('2021-12-01 01:00:00','Maria','device 1'),
('2021-12-01 01:00:00','John','device 2'),
('2021-12-01 02:00:00','John','device 3'),
('2021-12-01 02:00:00','Maria','device 2'),
('2021-12-03 05:00:00','John','device 4'),
('2021-12-03 09:00:00','John','device 5');

Query:

select t1.dt, t1.a, count(t2.dt) as devices_last_24h
from test t1
join test t2
on t1.a = t2.a and t1.dt between t2.dt and t2.dt + interval '1 day'
group by 1, 2
order by t1.dt;

I did a fiddle here - http://sqlfiddle.com/#!15/5dd49/8

There are a few edge conditions that you may want to think about that aren't present in your test data. Like: multiple rows for the same person and timestamp or 2 timestamps exactly 24 hours apart, how should these be counted?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bill Weiner