'Reuse part of a Regex pattern
Consider this (very simplified) example string:
1aw2,5cx7
As you can see, it is two digit/letter/letter/digit values separated by a comma.
Now, I could match this with the following:
>>> from re import match
>>> match("\d\w\w\d,\d\w\w\d", "1aw2,5cx7")
<_sre.SRE_Match object at 0x01749D40>
>>>
The problem is though, I have to write \d\w\w\d twice. With small patterns, this isn't so bad but, with more complex Regexes, writing the exact same thing twice makes the end pattern enormous and cumbersome to work with. It also seems redundant.
I tried using a named capture group:
>>> from re import match
>>> match("(?P<id>\d\w\w\d),(?P=id)", "1aw2,5cx7")
>>>
But it didn't work because it was looking for two occurrences of 1aw2, not digit/letter/letter/digit.
Is there any way to save part of a pattern, such as \d\w\w\d, so it can be used latter on in the same pattern? In other words, can I reuse a sub-pattern in a pattern?
Solution 1:[1]
Note: this will work with PyPi regex module, not with re module.
You could use the notation (?group-number), in your case:
(\d\w\w\d),(?1)
it is equivalent to:
(\d\w\w\d),(\d\w\w\d)
Be aware that \w includes \d. The regex will be:
(\d[a-zA-Z]{2}\d),(?1)
Solution 2:[2]
I was troubled with the same problem and wrote this snippet
import nre
my_regex=nre.from_string('''
a=\d\w\w\d
b={{a}},{{a}}
c=?P<id>{{a}}),(?P=id)
''')
my_regex["b"].match("1aw2,5cx7")
For lack of a more descriptive name, I named the partial regexes as a,b and c.
Accessing them is as easy as {{a}}
Solution 3:[3]
import re
digit_letter_letter_digit = re.compile("\d\w\w\d") # we compile pattern so that we can reuse it later
all_finds = re.findall(digit_letter_letter_digit, "1aw2,5cx7") # finditer instead of findall
for value in all_finds:
print(re.match(digit_letter_letter_digit, value))
Solution 4:[4]
Since you're already using re, why not use string processing to manage the pattern repetition as well:
pattern = "P,P".replace("P",r"\d\w\w\d")
re.match(pattern, "1aw2,5cx7")
OR
P = r"\d\w\w\d"
re.match(f"{P},{P}", "1aw2,5cx7")
Solution 5:[5]
Try using back referencing, i believe it works something like below to match
1aw2,5cx7
You could use
(\d\w\w\d),\1
See here for reference http://www.regular-expressions.info/backref.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Uri Goren |
| Solution 3 | Uddhav P. Gautam |
| Solution 4 | |
| Solution 5 | Srb1313711 |
