'find all occurrences of a regex as an array
have the following string (it's a salesforce query, but not important):
IF(OR(CONTAINS(EmailDomain,"yahoo"),CONTAINS(EmailDomain,"gmail"),
CONTAINS("protonmail.com,att.net,chpmail.com,smail.com",EmailDomain)),
"Free Mail","Business Email")
and I want to get an array of all substrings that are encapsulated between double quotes like so:
['yahoo',
'gmail',
'protonmail.com,att.net,chpmail.com,smail.com',
'Free Mail',
'Business Email']
in python I do:
re.findall(r'"(.+?)"', <my string>)
but is there a way to replicate this in Snowflake?
I've tried
SELECT
REGEXP_SUBSTR('IF(OR(CONTAINS(EmailDomain,"yahoo"),CONTAINS(EmailDomain,"gmail"),
CONTAINS("protonmail.com,att.net,chpmail.com,smail.com",EmailDomain)),
"Free Mail","Business Email")', '"(.+?)"') as emails;
but I get this:
"yahoo"),CONTAINS(EmailDomain,"gmail"
Solution 1:[1]
Wiktor's answer works great. I'm adding an alternate answer for anyone who needs to do this and their quoted strings may contain the pipe | character. Using the replacement method on strings containing pipe(s) will split the string into more than one array member. Here's a way (not the only way) to do it that will work in case the quoted strings could potentially contain pipe characters:
set col = $$IF(OR(CONTAINS(EmailDomain,"yahoo"),CONTAINS(EmailDomain,"gmail"),CONTAINS("protonmail.com,att.net,chpmail.com,smail.com",EmailDomain)),"Free Mail","Business Email | Other")$$;
create or replace function GET_QUOTED_STRINGS("s" string)
returns array
language javascript
strict immutable
as
$$
var re = /(["'])(?:\\.|[^\\])*?\1/g;
var m;
var out = [];
do {
m = re.exec(s);
if (m) {
out.push(m[0].replace(/['"]+/g, ''));
}
} while (m);
return out;
$$;
select get_quoted_strings($col);
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Greg Pavlik |
