'Partition by weeknumber and next weeknumber
I would like to count for each customer how many times a specific product was purchased in the past. I want to highlight the purchases for the same product (where the second order date is close to the first order date) with a rn = 2, so I can only count the rows with rn = 1
I created the following query and also included the current output. Its containing a partition by week number, to filter out purchases for the same product in the same week. It's working quite good, but the behaviour is not exactly what I was hoping for.
create table sandbox.hm_orders as
select o.customer_id
,o.product_id
,o.order_date
,ROW_NUMBER() over (partition by o.customer_id, o.product_id,concat(EXTRACT(year FROM order_date),EXTRACT(week FROM order_date)) order by o.order_date asc) as rn
,concat(EXTRACT(year FROM o.order_date),'_',EXTRACT(week FROM o.order_date)) as weeknr
from datamarts.orders o
where o.label_id = 1
and o.order_date > '2020-01-01'
and o.payment_status = 'PAID'
Current output:
| customer_id | product_ID | order_date | rn | weeknr |
|---|---|---|---|---|
| 4708818 | 128703 | 2020-05-11 20:19:25 | 1 | 2020_20 |
| 4708818 | 128703 | 2020-05-12 22:13:09 | 2 | 2020_20 |
| 4708818 | 128703 | 2020-06-06 21:45:04 | 1 | 2020_23 |
| 4708818 | 274578 | 2020-07-02 22:02:10 | 1 | 2020_27 |
| 4753958 | 137482 | 2021-03-14 18:13:04 | 1 | 2021_10 |
| 4753958 | 137482 | 2021-03-15 17:29:03 | 1 | 2021_11 |
As you can see in first two rows, the difference between the first the rows is 1 day and it will mark the second row with a rowNumber 2. For the last 2 rows, the difference between the orders is also 1 day. But since the weeknumbers are different, it will not give the second row a rowNumber 2.
Therefore I would like to find a way to also include the next weeknumber for the partition by. In this case, the order that have been done in 2021-11 needs a row number 2, and the week number 10 needs row number 1
desired output
| customer_id | product_ID | order_date | rn | weeknr |
|---|---|---|---|---|
| 4708818 | 128703 | 2020-05-11 20:19:25 | 1 | 2020_20 |
| 4708818 | 128703 | 2020-05-12 22:13:09 | 2 | 2020_20 |
| 4708818 | 128703 | 2020-06-06 21:45:04 | 1 | 2020_23 |
| 4708818 | 274578 | 2020-07-02 22:02:10 | 1 | 2020_27 |
| 4753958 | 137482 | 2021-03-14 18:13:04 | 1 | 2021_10 |
| 4753958 | 137482 | 2021-03-15 17:29:03 | 2 | 2021_11 |
Solution 1:[1]
A bit complicated way is to calculate a ranking by summing over a calculated flag.
Then use the rank in the row_number.
SELECT * , ROW_NUMBER() over (partition by customer_id, product_id, rnk order by order_date asc) as rn , TO_CHAR(order_date, 'YYYY_WW') as weeknr FROM ( SELECT * , SUM(flag) over (partition by customer_id, product_id order by order_date asc) as rnk FROM ( SELECT o.customer_id , o.product_id , o.order_date , CASE WHEN 1 >= DATE_PART('day', o.order_date - LAG(o.order_date) over (partition by o.customer_id, o.product_id order by o.order_date asc)) THEN 0 ELSE 1 END AS flag FROM orders o WHERE o.label_id = 1 AND o.order_date > '2020-01-01' AND o.payment_status = 'PAID' ) q1 ) q2
| customer_id | product_id | order_date | flag | rnk | rn | weeknr |
|---|---|---|---|---|---|---|
| 4708818 | 128703 | 2020-05-11 20:19:25 | 1 | 1 | 1 | 2020_19 |
| 4708818 | 128703 | 2020-05-12 22:13:09 | 0 | 1 | 2 | 2020_19 |
| 4708818 | 128703 | 2020-06-06 21:45:04 | 1 | 2 | 1 | 2020_23 |
| 4708818 | 274578 | 2020-07-02 22:02:10 | 1 | 1 | 1 | 2020_27 |
| 4753958 | 137482 | 2021-03-14 18:13:04 | 1 | 1 | 1 | 2021_11 |
| 4753958 | 137482 | 2021-03-15 17:29:03 | 0 | 1 | 2 | 2021_11 |
Test on db<>fiddle here
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
