'Google bigquery (GBQ) churn / retention rate from previous month
I'm using google bigquery and try to understand monthly churn from the following table, e.g. 1 user (C) churned, and 1 user (D) is new.
My thought was to self join and see which user shows 'null' in current month as a churn, 'null' in previous month is new. But the query returns to "left" join result when I'm using "full join". How should I fix it? I'll be open to other approach as well! thank you!
| userid | active_month |
|---|---|
| A | Jan 2022 |
| B | Jan 2022 |
| D | Jan 2022 |
| A | Dec 2021 |
| B | Dec 2021 |
| C | Dec 2021 |
| A | Nov 2021 |
| B | Nov 2021 |
select
t1.active_month,
count(distinct t1.userid) recent_user,
count(distinct t2.userid) previous_user,
count(distinct case when t1.userid is null then t2.userid end) as churned_user,
count(distinct case when t2.user is null then t1.userid end) as new_user
from table t1
full outer join table t2
on t1.active_month = datesub(t2.active_month, interval 1 month)
and t1.userid=t2.userid
Solution 1:[1]
The two problems I see with your query are:
- the join clause should be reversed, so subtracting 1 month from t1.active_date instead of t2.active_date
- when you select the t1.active_month you need to coalesce with t2.active_month because it's a full outer join
I have slightly changed your query and added one more condition in the calculation of churned_user to exclude the t2.userid with t2.active_month equal to the most recent active_month. Now it should be correct.
select
coalesce(t1.active_month, t2.active_month) as active_month,
count(distinct t1.userid) recent_user,
count(distinct IF(t2.userid is not null and t2.active_month is not null and t2.active_month <> (select max(active_month) from table), t2.userid, NULL)) previous_user,
count(distinct case when t1.userid is null and t2.active_month <> (select max(active_month) from table) then t2.userid end) as churned_user,
count(distinct case when t2.userid is null then t1.userid end) as new_user
from table t1
full outer join table t2
on date_sub(t1.active_month, interval 1 month) = t2.active_month and t1.userid = t2.userid
group by 1
order by 1
Just as an alternative approach the following would also calculate for each month:
- recent_user: as the number of active users for a month
- previous_user: as the number of active users for the previous month
- churned_user: as the number of users that were last active on a month
- new_user: as the number of users that are first seen on a month
Window functions LAG and LEAD come in handy for calculating the previous and next active month for each userid
WITH
table AS (
SELECT
MAX(active_month) OVER() AS current_month,
userid,
active_month,
LAG(active_month) OVER(PARTITION BY userid ORDER BY active_month) AS previous_month,
LEAD(active_month) OVER(PARTITION BY userid ORDER BY active_month) AS next_month
FROM
original_table
ORDER BY
userid,
active_month)
SELECT
active_month,
COUNT(userid) AS recent_user,
COUNTIF(DATE_DIFF(active_month, previous_month, MONTH) = 1) AS previous_user,
COUNTIF(next_month IS NULL AND active_month <> current_month) AS churned_user,
COUNTIF(previous_month IS NULL) AS new_user
FROM
TABLE
GROUP BY
active_month
ORDER BY
active_month
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
