'Python regex string escaping for re.sub replace argument? [duplicate]

Using re module it's possible to use escaping for the replace pattern. eg:

def my_replace(string, src, dst):
    import re
    return re.sub(re.escape(src), dst, string)

While this works for the most-part, the dst string may include "\\9" for example.

This causes an issue:

  • \\1, \\2 ... etc in dst, literals will be interpreted as groups.
  • using re.escape(dst) causes . to be changed to \..

Is there a way to escape the destination without introducing redundant character escaping?


Example usage:

>>> my_replace("My Foo", "Foo", "Bar")
'My Bar'

So far, so good.


>>> my_replace("My Foo", "Foo", "Bar\\Baz")
...
re.error: bad escape \B at position 3

This tries to interpret \B as having a special meaning.


>>> my_replace("My Foo", "Foo", re.escape("Bar\\Baz"))
'My Bar\\Baz'

Works!


>>> my_replace("My Foo", "Foo", re.escape("Bar\\Baz."))
'My Bar\\Baz\\.'

The . gets escaped when we don't want that.


While in this case str.replace can be used, the question about destination string remains useful since there may be times we want to use other features of re.sub such as the ability to ignore case.



Solution 1:[1]

In this case only the back-slash is interpreted as a special character, so instead of re.escape, you can use a simple replacement on in destination argument.

def my_replace(string, src, dst):
    import re
    return re.sub(re.escape(src), dst.replace("\\", "\\\\"), string)

Solution 2:[2]

You could resort to split:

haystack = r"some text with stu\ff to replace"
needle = r"stu\ff"
replacement = r"foo.bar"

result = replacement.join(re.split(re.escape(needle), haystack))
print(result)

This should also work with needle at the beginning or end of haystack.

Solution 3:[3]

Your code works fine, if you would just remove that re.escape, which I'm not sure why we would have that:

Test 1

import re 

def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = 'abbbbbb'
src = r'(ab)b+'
dst = r'\1z'

print(my_replace(src, dst, string))

Output 1

abz

Test 2

import re


def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = re.escape("abbbbbbBar\\Baz")
src = r'(ab)b+'
dst = r'\1z'

print(my_replace(src, dst, string))

Output 2

abzBar\Baz

Test 3

import re


def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = re.escape("abbbbbbBar\\Baz")
src = r'(ab)b+'
dst = r'\1' + re.escape('\\z')

print(my_replace(src, dst, string))

Output 3

ab\zBar\\Baz

Test 4

To construct the dst, we have to first know if we'd be replacing our string with any capturing groups such as \1 in this case. We cannot re.escape \1, otherwise we would replace our string with \\1, we have to construct the replacement, if there are capturing groups, then append it to any other part that requires re.escaping.

import re


def my_replace(src, dst, string):
    return re.sub(src, dst, string)


string = re.escape("abbbbbbBar\\Baz")
src = r'(ab)b+'
dst = r'\1' + re.escape('\9z')

print(my_replace(src, dst, string))

Output 4

ab\9zBar\\Baz

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 blubberdiblub
Solution 3