'How to Query New, Repeating, and Lapsed Customers with SQL
I am working on a customer lifecycle analysis in which I need to group account IDs and transaction activity into three buckets: New, Repeating, and Lapsed Transactors.
I am having trouble segmenting new and repeating customers when a customer has transacted more than once in the same day. By our definition, when a customer transacts more than once in the same day with a different purchase_order_id they should be labeled as a 'Repeat Transactor'.
The transaction table named direct_transactions contains the following columns:
| account_id | country_iso_cd | purchase_order_id | product_desc | transaction_date |
|---|---|---|---|---|
| 642250536147127524 | US | 1015182481 | product_1 | 2020-09-18 10:58:02.000 |
| 4984541218433989577 | US | 1015469885 | product_2 | 2020-11-13 13:35:53.000 |
To make things even more difficult, the timestamp for transaction_date contains hour, minute, and second. Therefore, I've used DATE_TRUNC throughout my query to more easily align transaction time stamps that occur same day. However, since my final join relies on transaction_date as a key I believe this is causing NULLS to populate.
I've provided a screen capture of an example where account_id purchased twice in the same day with different purchase_order_ids. I would like both records to populate as 'Repeat Transactor'.
Here is the query I've developed, thanks in advance for helping me solve a very tricky problem:
WITH
order_sequence AS (
SELECT
DATE_TRUNC('day',transaction_date) AS transaction_date,
purchase_order_id,
account_id,
ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY
DATE_TRUNC('day',transaction_date) ASC) as account_order_sequence,
LAG(DATE_TRUNC('day',transaction_date)) OVER (PARTITION BY account_id ORDER BY
DATE_TRUNC('day',transaction_date) ASC) as previous_order_date
FROM direct_transactions
GROUP BY 1,2,3
),
time_between_transactions AS (
SELECT
DATE_TRUNC('day',transaction_date) AS transaction_date,
purchase_order_id,
account_id,
account_order_sequence,
CASE WHEN previous_order_date IS NULL THEN transaction_date
ELSE previous_order_date
END AS previous_order_date,
DATEDIFF(DAY, previous_order_date, DATE_TRUNC('day',transaction_date)) AS days_between_transactions
FROM order_sequence a
),
customer_life_cycle AS (
SELECT
DATE_TRUNC('day',transaction_date) AS transaction_date,
purchase_order_id,
account_id,
CASE WHEN account_order_sequence = 1 THEN 'New Transactor'
WHEN days_between_transactions > 0 AND days_between_transactions <= 365 THEN 'Repeat Transactor'
WHEN days_between_transactions > 365 THEN 'Lapsed Transactor'
ELSE 'Unknown'
END AS customer_life_cycle,
account_order_sequence,
previous_order_date,
CASE
WHEN days_between_transactions IS NULL THEN 0
ELSE days_between_transactions
END AS days_between_transactions
FROM time_between_transactions
)
SELECT
DATE_TRUNC('day',t1.transaction_date) AS transaction_date,
t2.previous_order_date,
t1.account_id,
t2.customer_life_cycle,
t2.account_order_sequence,
t2.days_between_transactions,
t1.purchase_order_id,
t1.product_desc,
t1.country_iso_cd
FROM direct_transactions AS t1
LEFT JOIN customer_life_cycle AS t2
ON
(t1.account_id=t2.account_id
AND t1.transaction_date=t2.transaction_date)
Solution 1:[1]
If you just create a CTE that does a count distinct on the purchase_order_id, won't that tell you the information you need - if the count > 1 then then it's a repeat purchaser?
WITH
daily_counts AS (
SELECT
DATE_TRUNC('day',transaction_date) AS transaction_date,
account_id,
COUNT(DISTINCT purchase_order_id) AS daily_distinct_orders
FROM direct_transactions
GROUP BY 1,2
)...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | NickW |

