'Datastudio: REGEXP_Extract the last part of URL

I am looking to extract the last part of the URL, which looks like this:

https://www.website.com/cat1/cat2/naming/id.html

I've been trying to edit this:

REGEXP_EXTRACT(Product URL,'/([\\w-]+)$')

and I'm having a lot of trouble trying to get just id as the output.

The output of the above gets me a null value. If I remove the $, I get www.

What is the best way to get the id, between the last slash and before the .html?



Solution 1:[1]

You can use

REGEXP_EXTRACT(Product URL,'/([^/]*)\\.[^/.]*$')

See the regex demo.

Details

  • / - a / char
  • ([^/]*) - Group 1: any zero or more chars other than /
  • \. - a . char
  • [^/.]* - zero or more chars other than / and .`
  • $ - end of string.

Another possible solution is matching up to the first . char:

/([^./]*)[^/]*$

See this regex demo. Here, ([^./]*) captures into Group 1 any zero or more chars other than . and / chars, and then [^/]*$ matches any zero or more chars other than / till the end of string.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1