'Python re.sub multiline on string
I try to use the flag re.MULTILINE.
I read these posts : Bug in Python Regex? (re.sub with re.MULTILINE), Python re.sub MULTILINE caret match but it doesn't work. The code :
import re
if __name__ == '__main__':
txt = "\n\
<?php\n\
/* Multi-line\n\
comment */\n\
$var = 1;\n"
new_txt = re.sub(r'\/\*[.\n]*?\*\/', '', txt, flags=re.MULTILINE)
print("\n=========== TXT ============")
print(txt)
print("\n=========== NEW TXT ============")
print(new_txt)
The code output :
=========== TXT ============
<?php
/* Multi-line
comment */
$var = 1;
=========== NEW TXT ============
<?php
/* Multi-line
comment */
$var = 1;
But new_txt should not contains Multi-line comment. I want to get the txt without the Multi-line comment. Do you have any idea ?
Solution 1:[1]
You need to replace re.MULTILINE with re.DOTALL/re.S and move out period outside the character class as inside it, the dot matches a literal ..
Note that re.MULTILINE only redefines the behavior of ^ and $ that are forced to match at the start/end of a line rather than the whole string. The re.DOTALL flag redefines the behavior of . inside the pattern outside the character class only. It starts matching a newline symbol, too.
So, the regex you could use for the current example: /\*.*?\*/. It matches a literal /* with /\*, then .*? matches as few any symbols as possible up to and including */ (matched with \*/).
See the code demo:
txt = """\n\
<?php\n\
/* Multi-line\n\
comment */\n\
$var = 1;\n"""
new_txt = re.sub(r'/\*.*?\*/', '', txt, flags=re.S)
print("\n=========== TXT ============")
print(txt)
print("\n=========== NEW TXT ============")
print(new_txt)
See IDEONE demo
However, it is not the best solution, as in most cases multiline comments are very long. The best is an unrolling-the-loop technique. The regex above can be "unrolled" like this:
/\*[^*]*(?:\*(?!/)[^*]*)*\*/
See the regex demo
Solution 2:[2]
In my example, I had many lines, and my aim was to replace a known sentence that could reach over more than one line within this multi-line block. What I had to do was to re.escape() the sentence, no multi-line flags needed. Also, all of these "" and even "\n" are unneeded in the input if you have the line wrap anyway, and in a multi-line string ("""... ...""").
txt = """<?php
/* Multi-line
comment */
$var = 1;"""
new_txt = re.sub(re.escape(r"""/* Multi-line
comment */"""), '', txt)
print("\n=========== TXT ============")
print(txt)
print("\n=========== NEW TXT ============")
print(new_txt)
>>> print("\n=========== TXT ============")
=========== TXT ============
>>> print(txt)
<?php
/* Multi-line
comment */
$var = 1;
>>> print("\n=========== NEW TXT ============")
=========== NEW TXT ============
>>> print(new_txt)
<?php
$var = 1;
Perhaps it helps someone with a problem only near to that in the question.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | questionto42standswithUkraine |
