'How to Query New, Repeating, and Lapsed Customers with SQL

I am working on a customer lifecycle analysis in which I need to group account IDs and transaction activity into three buckets: New, Repeating, and Lapsed Transactors.

I am having trouble segmenting new and repeating customers when a customer has transacted more than once in the same day. By our definition, when a customer transacts more than once in the same day with a different purchase_order_id they should be labeled as a 'Repeat Transactor'.

The transaction table named direct_transactions contains the following columns:

account_id country_iso_cd purchase_order_id product_desc transaction_date
642250536147127524 US 1015182481 product_1 2020-09-18 10:58:02.000
4984541218433989577 US 1015469885 product_2 2020-11-13 13:35:53.000

To make things even more difficult, the timestamp for transaction_date contains hour, minute, and second. Therefore, I've used DATE_TRUNC throughout my query to more easily align transaction time stamps that occur same day. However, since my final join relies on transaction_date as a key I believe this is causing NULLS to populate.

I've provided a screen capture of an example where account_id purchased twice in the same day with different purchase_order_ids. I would like both records to populate as 'Repeat Transactor'.

enter image description here

Here is the query I've developed, thanks in advance for helping me solve a very tricky problem:

WITH 

order_sequence AS ( 

  SELECT 
  DATE_TRUNC('day',transaction_date) AS transaction_date,
  purchase_order_id,
  account_id,
  ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY 
  DATE_TRUNC('day',transaction_date) ASC) as account_order_sequence,
  LAG(DATE_TRUNC('day',transaction_date)) OVER (PARTITION BY account_id ORDER BY 
  DATE_TRUNC('day',transaction_date) ASC) as previous_order_date
  FROM direct_transactions 
  GROUP BY 1,2,3
),


time_between_transactions AS (
  
  SELECT
    DATE_TRUNC('day',transaction_date) AS transaction_date,
    purchase_order_id,
    account_id,
    account_order_sequence,
    
    CASE WHEN previous_order_date IS NULL THEN transaction_date
    ELSE previous_order_date 
    END AS previous_order_date,
    
    DATEDIFF(DAY, previous_order_date, DATE_TRUNC('day',transaction_date)) AS days_between_transactions
 
FROM order_sequence a
),
    
customer_life_cycle AS ( 

  SELECT 
   DATE_TRUNC('day',transaction_date) AS transaction_date,
   purchase_order_id,
   account_id,

   CASE WHEN account_order_sequence = 1 THEN 'New Transactor'
        WHEN days_between_transactions > 0 AND days_between_transactions <= 365 THEN 'Repeat Transactor'
        WHEN days_between_transactions > 365 THEN 'Lapsed Transactor'
        ELSE 'Unknown'
        END AS customer_life_cycle,

   account_order_sequence,
   previous_order_date,

   CASE 
   WHEN days_between_transactions IS NULL THEN 0
   ELSE days_between_transactions
   END AS days_between_transactions

FROM time_between_transactions
)
    
   
SELECT 
DATE_TRUNC('day',t1.transaction_date) AS transaction_date,
t2.previous_order_date,
t1.account_id,
t2.customer_life_cycle,
t2.account_order_sequence,
t2.days_between_transactions,
t1.purchase_order_id,
t1.product_desc,
t1.country_iso_cd


FROM direct_transactions AS t1
LEFT JOIN customer_life_cycle AS t2

ON 
(t1.account_id=t2.account_id 
AND t1.transaction_date=t2.transaction_date)


Solution 1:[1]

If you just create a CTE that does a count distinct on the purchase_order_id, won't that tell you the information you need - if the count > 1 then then it's a repeat purchaser?

WITH 
daily_counts AS ( 

  SELECT 
  DATE_TRUNC('day',transaction_date) AS transaction_date,
  account_id,  
  COUNT(DISTINCT purchase_order_id) AS daily_distinct_orders
  FROM direct_transactions 
  GROUP BY 1,2
)...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 NickW