'Can Match_Recognize() Find Conversion Rates for Multiple Patterns?
I am analyzing user patterns in Snowflake for an ecommerce website. I would like to be able to match various patterns of a user flow (e.g. did they complete an order after viewing a specific page? did they complete an order after selecting an add to cart from a particular portion of the page? etc.).
Is it possible to calculate conversion rates for multiple patterns with a match recognize function?
The data structure looks something like this:
CREATE TEMPORARY TABLE AS events_and_visits (
VISIT_ID bigint,
EVENT_ID bigint,
EVENT_NAME VARCHAR,
REFERENCE VARCHAR
);
INSERT INTO events_and_visits VALUES
(1, 1, 'productView', 'reco'),
(1, 2, 'Add To Cart', 'reco'),
(1, 3, 'Order Complete', NULL),
(2, 4, 'productView', 'reco'),
(3, 5, 'productView', 'reco'),
(3, 6, 'Add To Cart', 'merchant'),
(4, 7, 'productView', 'reco'),
(4, 8, 'productView', 'reco'),
(4, 9, 'Add To Cart', 'merchant'),
(4, 10, 'Order Complete', NULL);
My failed attempt
SELECT *
FROM
events_and_visits MATCH_RECOGNIZE(
PARTITION BY visit_id
ORDER BY
event_id
MEASURES
match_number() AS match_number,
classifier() AS clf
ALL ROWS PER MATCH WITH UNMATCHED ROWS
PATTERN (
(product_rec_view + atc_merchant * | atc_rec *) * oc * --THIS IS SO F*****
)
DEFINE
product_rec_view AS (
event_name = 'productView'
AND reference = 'reco'
),
atc_rec AS (
event_name = 'Add To Cart'
AND reference = 'reco'
),
atc_merchant AS (
event_name = 'Add To Cart'
AND reference = 'merchant'
),
oc AS event_name = 'Order Complete'
);
Solution 1:[1]
The data insert can be made simple (and thus only need one execution):
INSERT INTO events_and_visits VALUES
(1, 1, 'productView', 'reco'),
(1, 2, 'Add To Cart', 'reco'),
(1, 3, 'Order Complete', NULL),
(2, 4, 'productView', 'reco'),
(3, 5, 'productView', 'reco'),
(3, 6, 'Add To Cart', 'merchant'),
(4, 7, 'productView', 'reco'),
(4, 8, 'productView', 'reco'),
(4, 9, 'Add To Cart', 'merchant'),
(4, 10, 'Order Complete', NULL);
both (product_rec_view+ (atc_merchant | atc_rec))? oc* or product_rec_view+ (atc_merchant | atc_rec) oc? gives me what I feel like you are wanting, but it's hard to fully understand you intent
SELECT *
FROM
events_and_visits MATCH_RECOGNIZE(
PARTITION BY visit_id
ORDER BY
event_id
MEASURES
MATCH_SEQUENCE_NUMBER() AS mseq,
match_number() AS match_number,
classifier() AS clf
ALL ROWS PER MATCH WITH UNMATCHED ROWS
PATTERN (
(product_rec_view+ (atc_merchant | atc_rec))? oc*
--product_rec_view+ (atc_merchant | atc_rec) oc?
)
DEFINE
product_rec_view AS (
event_name = 'productView' AND reference = 'reco'
),
atc_rec AS (
event_name = 'Add To Cart' AND reference = 'reco'
),
atc_merchant AS (
event_name = 'Add To Cart' AND reference = 'merchant'
),
oc AS event_name = 'Order Complete'
)
ORDER BY 1,2;
| VISIT_ID | EVENT_ID | EVENT_NAME | REFERENCE | MSEQ | MATCH_NUMBER | CLF |
|---|---|---|---|---|---|---|
| 1 | 1 | productView | reco | 1 | 1 | PRODUCT_REC_VIEW |
| 1 | 2 | Add To Cart | reco | 2 | 1 | ATC_REC |
| 1 | 3 | Order Complete | 3 | 1 | OC | |
| 2 | 4 | productView | reco | 1 | 1 | |
| 3 | 5 | productView | reco | 1 | 1 | PRODUCT_REC_VIEW |
| 3 | 6 | Add To Cart | merchant | 2 | 1 | ATC_MERCHANT |
| 4 | 7 | productView | reco | 1 | 1 | PRODUCT_REC_VIEW |
| 4 | 8 | productView | reco | 2 | 1 | PRODUCT_REC_VIEW |
| 4 | 9 | Add To Cart | merchant | 3 | 1 | ATC_MERCHANT |
| 4 | 10 | Order Complete | 4 | 1 | OC |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Simeon Pilgrim |
