're.sub replace with matched content
Trying to get to grips with regular expressions in Python, I'm trying to output some HTML highlighted in part of a URL. My input is
images/:id/size
my output should be
images/<span>:id</span>/size
If I do this in Javascript
method = 'images/:id/size';
method = method.replace(/\:([a-z]+)/, '<span>$1</span>')
alert(method)
I get the desired result, but if I do this in Python
>>> method = 'images/:id/huge'
>>> re.sub('\:([a-z]+)', '<span>$1</span>', method)
'images/<span>$1</span>/huge'
I don't, how do I get Python to return the correct result rather than $1? Is re.sub even the right function to do this?
Solution 1:[1]
Use \1 instead of $1.
\number Matches the contents of the group of the same number.
http://docs.python.org/library/re.html#regular-expression-syntax
Solution 2:[2]
A backreference to the whole match value is \g<0>, see re.sub documentation:
The backreference
\g<0>substitutes in the entire substring matched by the RE.
See the Python demo:
import re
method = 'images/:id/huge'
print(re.sub(r':[a-z]+', r'<span>\g<0></span>', method))
# => images/<span>:id</span>/huge
If you need to perform a case insensitive search, add flag=re.I:
re.sub(r':[a-z]+', r'<span>\g<0></span>', method, flags=re.I)
Solution 3:[3]
For the replacement portion, Python uses \1 the way sed and vi do, not $1 the way Perl, Java, and Javascript (amongst others) do. Furthermore, because \1 interpolates in regular strings as the character U+0001, you need to use a raw string or \escape it.
Python 3.2 (r32:88445, Jul 27 2011, 13:41:33)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> method = 'images/:id/huge'
>>> import re
>>> re.sub(':([a-z]+)', r'<span>\1</span>', method)
'images/<span>id</span>/huge'
>>>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | tchrist |
