'convert js regex into python regex

I'm working on a part of a project, which is repleacing http url's with https url's if possible.

The Problem is, that the regular expressions for that are written for the javascript regex parser, but I'm using that regex inside python. To be compatible, I would rewrite the regex during parsing into a valide python regex.

as example, I have that regular expression given:

https://$1wikimediafoundation.org/

and I would a regular expression like that:

https://\1wikimediafoundation.org/

my problem is that I doesn't know how to do that (converting $ into \)


This code doesn't work:

'https://$1wikimediafoundation.org/'.replace('$', '\')

generate the following error:

SyntaxError: EOL while scanning string literal

This code work without error:

'https://$1wikimediafoundation.org/'.replace('$', '\\')

but generate a wrong output:

'https://\\1wikimediafoundation.org/'


Solution 1:[1]

Actually it works:

>>> 'https://$1wikimediafoundation.org/'.replace('$', '\\')
'https://\\1wikimediafoundation.org/'
>>> print 'https://$1wikimediafoundation.org/'.replace('$', '\\')
https://\1wikimediafoundation.org/

when you are doing 'https://$1wikimediafoundation.org/'.replace('$', '\\'), it's returning the __repr__ (~representation) of the string and you can see special characters.

By printing it, you are using the __str__, the readable version. (See this answer on __str__ vs __repr__)

Solution 2:[2]

try this:

'https://$1wikimediafoundation.org/'.replace('$', r'\')

adding r"\" whill automatically escape the backslash which you are trying to do.

Solution 3:[3]

You test your regex here https://regex101.com/, and then change it to python. Additionaly, to replace the matched group, you can use re.sub module on these lines:

re.sub(r"'([^']*)'", r'{\1}', col ) ) replace

'Protein_Expectation_Value_Log(e)', 'Protein_Intensity_Log(I)'

{Protein_Expectation_Value_Log(e)}, {Protein_Intensity_Log(I)}

More you can refer here

Solution 4:[4]

Note that $& in replacement patterns should be converted to \g<0>, since \0 is \0x00 character in python regex

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 shep
Solution 3 Anu
Solution 4 Wiktor Stribiżew