'How to extract a substring of a URL with regex

Working with some unstructured data and hoping to extract utm_campaign names from URLs. Here's an example of one URL. I'd like to extract the name following "utm_campaign=" but before "&utm_source" each time. How can I do this with regex?

https://data.io/?utm_campaign=branded&utm_source=google&utm_medium=cpc&utm_term=data&gclid=CjwKCAjwiuuRBhBvEiwAFXKaNGwDI2MaPLQpb9duvQZ2XeOA4vpk7FuBTqhYiJ2ysnirdUh8SZ6-zxoC4VYQAvD_BwE

I've tried various regex functions, for instance:

regexp_substr(get_path(event_properties, 'url'), '(\[A-Za-z\_-\]\*)-utm_campaign')


Solution 1:[1]

Easier done with parse_url

select parse_url(your_url):parameters:"utm_campaign"::string

Or you can divide and conquer using split_part

select split_part(split_part(your_url,'utm_campaign=',-1),'&',1)

Solution 2:[2]

Try this one:

SELECT regexp_substr(get_path(event_properties, 'url'), '\\b+[-utm_campaign=](\\w+)&\\b+', 1, 1, 'e', 1)

Reference: REGEXP_SUBSTR

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Phil Coulson
Solution 2 Michael Golos