'Rolling COUNT DISTINCT of n-day active users using T-SQL
I am counting 7-day active users using T-SQL. I used the following code:
SELECT
*,
COUNT(DISTINCT [UserID]) OVER (
PARTITION BY [HospitalID], [HospitalName], [Device]
ORDER BY [Date]
ROWS 7 PRECEDING
) AS [7-Day Active Users]
FROM UserActivity
ORDER BY [HospitalID], [HospitalName], [Device], [Date]
I was told Use of DISTINCT is not allowed with the OVER clause.
UserActivity is a table with columns HospitalID, HospitalName, Device (either phone or tablet), Date and UserID (could be NULL). To make things easier, I have filled the gaps between dates which made Date consecutive so I can use ROWS 7 PRECEDING with confidence. I did a lot of searches online and found most solution are either using other types of SQL (which is not possible in my case) or using DENSE_RANK function which does not support a moving window. What is the correct and hopefully simpler, concise way of solving my problem?
Solution 1:[1]
Sorry to see that COUNT DISTINCT was not supported in that type of SQL... I hadn't known that. Especially after you went to the trouble of fixing the gaps between dates!
I used Rasgo to generate the SQL -- so this won't work directly in your version (tested with Snowflake), but I think it will work as long as you fix the DATEADD function. Every RDBMS seems to do DATEADD differently, it seems.
The general concept here is to join the data upon itself using a range join condition in the WHERE clause.
Luckily, this should work for you without having to fix the gaps in the dates first.
WITH BASIC_OFFSET_7DAY AS (
SELECT
A.HOSPITALNAME,
A.HOSPITALID,
A.DEVICE,
A.DATE,
COUNT(DISTINCT B.USERID) as COUNT_DISTINCT_USERID_PAST7DAY,
COUNT(1) AS AGG_ROW_COUNT
FROM
UserActivity A
INNER JOIN UserActivity B ON A.HOSPITALNAME = B.HOSPITALNAME
AND A.HOSPITALID = B.HOSPITALID
AND A.DEVICE = B.DEVICE
WHERE
B.DATE >= DATEADD(day, -7, A.DATE)
AND B.DATE <= A.DATE
GROUP BY
A.HOSPITALNAME,
A.HOSPITALID,
A.DEVICE,
A.DATE
)
SELECT
src.*,
BASIC_OFFSET_7DAY.COUNT_DISTINCT_USERID_PAST7DAY
FROM
UserActivity src
LEFT OUTER JOIN BASIC_OFFSET_7DAY ON BASIC_OFFSET_7DAY.DATE = src.DATE
AND BASIC_OFFSET_7DAY.HOSPITALNAME = src.HOSPITALNAME
AND BASIC_OFFSET_7DAY.HOSPITALID = src.HOSPITALID
AND BASIC_OFFSET_7DAY.DEVICE = src.DEVICE
Let me know how that works out and if it doesn't work I'll help you out.
Edit: For those who are trying to do this and getting stuck, a common mistake (one that I myself performed when I did this by hand) is to pay careful attention to COUNT(DISTINCT(B.col)) and not A.col. When I used Rasgo to generate the SQL to check myself, I caught my mistake. Hopefully this note helps someone in the future who makes this same mistake!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
