'How to extract a substring of a URL with regex
Working with some unstructured data and hoping to extract utm_campaign names from URLs. Here's an example of one URL. I'd like to extract the name following "utm_campaign=" but before "&utm_source" each time. How can I do this with regex?
https://data.io/?utm_campaign=branded&utm_source=google&utm_medium=cpc&utm_term=data&gclid=CjwKCAjwiuuRBhBvEiwAFXKaNGwDI2MaPLQpb9duvQZ2XeOA4vpk7FuBTqhYiJ2ysnirdUh8SZ6-zxoC4VYQAvD_BwE
I've tried various regex functions, for instance:
regexp_substr(get_path(event_properties, 'url'), '(\[A-Za-z\_-\]\*)-utm_campaign')
Solution 1:[1]
Easier done with parse_url
select parse_url(your_url):parameters:"utm_campaign"::string
Or you can divide and conquer using split_part
select split_part(split_part(your_url,'utm_campaign=',-1),'&',1)
Solution 2:[2]
Try this one:
SELECT regexp_substr(get_path(event_properties, 'url'), '\\b+[-utm_campaign=](\\w+)&\\b+', 1, 1, 'e', 1)
Reference: REGEXP_SUBSTR
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Phil Coulson |
| Solution 2 | Michael Golos |
