'Regex that matches strings with specific text not between text in BigQuery

I have the following strings:

step_1->step_2->step_3
step_1->step_3
step_1->step_2->step_1->step_3
step_1->step_2->step_1->step_2->step_3

What I would like to do is to capture the ones that between step_1 and step 3 there's no step_2.

The results should be like this:

string                                   result
step_1->step_2->step_3                    false
step_1->step_3                             true
step_1->step_2->step_1->step_3             true
step_1->step_2->step_1->step_2->step_3    false

I have tried to use the negative lookahead but I found out that BigQuery doesn't support it. Any ideas?



Solution 1:[1]

Consider also below option

select string, 
  not regexp_contains(string, r'step_1->(step_2->)+step_3\b') as result
from your_table

Solution 2:[2]

I believe @Daniel_Zagales answer is the one you were expecting. However here is a broader solution that can maybe be interesting in your usecase:it consists in using arrays

 WITH sample AS (
  SELECT 'step_1->step_2->step_3' AS path
  UNION ALL SELECT 'step_1->step_3'
  UNION ALL SELECT 'step_1->step_2->step_1->step_3'
  UNION ALL SELECT 'step_1->step_2->step_1->step_2->step_3'
),
  temp AS (
  SELECT
    path,
    SPLIT(REGEXP_REPLACE(path,'step_', ''), '->') AS sequences
  FROM
    sample)
SELECT
  path,
  position,
  flattened AS current_step,
  LAG(flattened) OVER (PARTITION BY path ORDER BY OFFSET ) AS previous_step,
  LEAD(flattened) OVER (PARTITION BY path ORDER BY OFFSET ) AS following_step
FROM
  temp,
  temp.sequences AS flattened
WITH
OFFSET AS position
 

This query returns the following table

enter image description here

The concept is to get an array of the step number (splitting on '->' and erasing 'step_') and to keep the OFFSET (crucial as UNNESTing arrays does not guarantee keeping the order of an array).

The table obtained contains for each path and step of said path, the previous and following step. It is therefore easy to test for instance if successive steps have a difference of 1.

(SELECT * FROM <previous> WHERE ABS(current_step-previous_step) != 1 for example)

(CASTing to INT required)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mikhail Berlyant
Solution 2 Cylldby