'How to extract slug from URL with regular expression in Python?
I'm struggling with Python's re. I don't know how to solve the following problem in a clean way.
I want to extract a part of an URL,
What I tried so far:
url = http://www.example.com/this-2-me-4/123456-subj
m = re.search('/[0-9]+-', url)
m = m.group(0).rstrip('-')
m = m.lstrip('/')
This leaves me with the desired output 123456, but I feel this is not the proper way to extract the slug.
How can I solve this quicker and cleaner?
Solution 1:[1]
Use a capturing group by putting parentheses around the part of the regex that you want to capture (...). You can get the contents of a capturing group by passing in its number as an argument to m.group():
>>> m = re.search('/([0-9]+)-', url)
>>> m.group(1)
123456
From the docs:
(...)
Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the\numberspecial sequence, described below. To match the literals'('or')', use\(or\), or enclose them inside a character class:[(] [)].
Solution 2:[2]
You may want to use urllib.parse combined with a capturing group for mildly cleaner code.
import urllib.parse, re
url = 'http://www.example.com/this-2-me-4/123456-subj'
parsed = urllib.parse.urlparse(url)
path = parsed.path
slug = re.search(r'/([\d]+)-', path).group(1)
print(slug)
Result:
123456
In Python 2, use urlparse instead of urllib.parse.
Solution 3:[3]
if you wants to find all the slugs available in a URL you can use this code.
from slugify import slugify
url = "https://www.allrecipes.com/recipe/79300/real-poutine?search=random/some-name/".split("/")
for i in url:
i = i.split("?")[0] if "?" in i else i
if "-" in i and slugify(i) == i:
print(i)
This will provide with an output of
real-poutine
some-name
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | senshin |
| Solution 3 |
