'Regex to extract domain from website column
I have a column called 'website' from which I require to extract the domain part only
Examples
- www.google.com -> google.com
- www.google.com/field -> google.com
- https://www.nearbyplaces.com -> nearbyplaces.com
- http://hcc.ca -> hcc.ca
- http://hcc.ca/info -> hcc.ca
- http://hcc.ca/ -> hcc.ca
What I have done so far:
select distinct website,
CASE WHEN website like '%//www.%' THEN REPLACE(REGEXP_SUBSTR(website,'//[^/\\\,=@\\+]+\\.[^/:;,\\\\\(\\)]+'),'//www.','')
WHEN website like '%//%' THEN REPLACE(REGEXP_SUBSTR(website,'//[^/\\\,=@\\+]+\\.[^/:;,\\\\\(\\)]+'),'//','')
WHEN website is null then null
WHEN website like '%www.%' THEN REPLACE(REGEXP_SUBSTR(website,'.([^/]*)'),'www.','')
else website
end as domain
Two Things: I am certain that I am missing out on some of the test cases. Secondly, I want to optimise the verbose in this solution. Any help in improving my snippet is appreciated. (FYI I am using Redshift )
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
