'Regex that matches strings with specific text not between text in BigQuery
I have the following strings:
step_1->step_2->step_3
step_1->step_3
step_1->step_2->step_1->step_3
step_1->step_2->step_1->step_2->step_3
What I would like to do is to capture the ones that between step_1 and step 3 there's no step_2.
The results should be like this:
string result
step_1->step_2->step_3 false
step_1->step_3 true
step_1->step_2->step_1->step_3 true
step_1->step_2->step_1->step_2->step_3 false
I have tried to use the negative lookahead but I found out that BigQuery doesn't support it. Any ideas?
Solution 1:[1]
Consider also below option
select string,
not regexp_contains(string, r'step_1->(step_2->)+step_3\b') as result
from your_table
Solution 2:[2]
I believe @Daniel_Zagales answer is the one you were expecting. However here is a broader solution that can maybe be interesting in your usecase:it consists in using arrays
WITH sample AS (
SELECT 'step_1->step_2->step_3' AS path
UNION ALL SELECT 'step_1->step_3'
UNION ALL SELECT 'step_1->step_2->step_1->step_3'
UNION ALL SELECT 'step_1->step_2->step_1->step_2->step_3'
),
temp AS (
SELECT
path,
SPLIT(REGEXP_REPLACE(path,'step_', ''), '->') AS sequences
FROM
sample)
SELECT
path,
position,
flattened AS current_step,
LAG(flattened) OVER (PARTITION BY path ORDER BY OFFSET ) AS previous_step,
LEAD(flattened) OVER (PARTITION BY path ORDER BY OFFSET ) AS following_step
FROM
temp,
temp.sequences AS flattened
WITH
OFFSET AS position
This query returns the following table
The concept is to get an array of the step number (splitting on '->' and erasing 'step_') and to keep the OFFSET (crucial as UNNESTing arrays does not guarantee keeping the order of an array).
The table obtained contains for each path and step of said path, the previous and following step. It is therefore easy to test for instance if successive steps have a difference of 1.
(SELECT * FROM <previous> WHERE ABS(current_step-previous_step) != 1 for example)
(CASTing to INT required)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mikhail Berlyant |
| Solution 2 | Cylldby |

