'Regex to extract domain from website column

I have a column called 'website' from which I require to extract the domain part only

Examples

  1. www.google.com -> google.com
  2. www.google.com/field -> google.com
  3. https://www.nearbyplaces.com -> nearbyplaces.com
  4. http://hcc.ca -> hcc.ca
  5. http://hcc.ca/info -> hcc.ca
  6. http://hcc.ca/ -> hcc.ca

What I have done so far:


select distinct website, 
    CASE WHEN website like '%//www.%' THEN REPLACE(REGEXP_SUBSTR(website,'//[^/\\\,=@\\+]+\\.[^/:;,\\\\\(\\)]+'),'//www.','') 
         WHEN website like '%//%' THEN REPLACE(REGEXP_SUBSTR(website,'//[^/\\\,=@\\+]+\\.[^/:;,\\\\\(\\)]+'),'//','') 
         WHEN website is null then null
         WHEN website like '%www.%' THEN REPLACE(REGEXP_SUBSTR(website,'.([^/]*)'),'www.','')
         else website
         end as domain


Two Things: I am certain that I am missing out on some of the test cases. Secondly, I want to optimise the verbose in this solution. Any help in improving my snippet is appreciated. (FYI I am using Redshift )



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source