'How to match a newline character in a raw string?
I got a little confused about Python raw string. I know that if we use raw string, then it will treat '\' as a normal backslash (ex. r'\n' would be \ and n). However, I was wondering what if I want to match a new line character in raw string. I tried r'\\n', but it didn't work.
Anybody has some good idea about this?
Solution 1:[1]
The simplest answer is to simply not use a raw string. You can escape backslashes by using \\.
If you have huge numbers of backslashes in some segments, then you could concatenate raw strings and normal strings as needed:
r"some string \ with \ backslashes" "\n"
(Python automatically concatenates string literals with only whitespace between them.)
Remember if you are working with paths on Windows, the easiest option is to just use forward slashes - it will still work fine.
Solution 2:[2]
you also can use [\r\n] for matching to new line
Solution 3:[3]
def clean_with_puncutation(text):
from string import punctuation
import re
punctuation_token={p:'<PUNC_'+p+'>' for p in punctuation}
punctuation_token['<br/>']="<TOKEN_BL>"
punctuation_token['\n']="<TOKEN_NL>"
punctuation_token['<EOF>']='<TOKEN_EOF>'
punctuation_token['<SOF>']='<TOKEN_SOF>'
#punctuation_token
regex = r"(<br/>)|(<EOF>)|(<SOF>)|[\n\!\@\#\$\%\^\&\*\(\)\[\]\
{\}\;\:\,\.\/\?\|\`\_\\+\\\=\~\-\<\>]"
###Always put new sequence token at front to avoid overlapping results
#text = '<EOF>!@#$%^&*()[]{};:,./<>?\|`~-= _+\<br/>\n <SOF>\ '
text_=""
matches = re.finditer(regex, text)
index=0
for match in matches:
#print(match.group())
#print(punctuation_token[match.group()])
#print ("Match at index: %s, %s" % (match.start(), match.end()))
text_=text_+ text[index:match.start()] +" "
+punctuation_token[match.group()]+ " "
index=match.end()
return text_
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Gareth Latty |
| Solution 2 | Mohammad Hossein zare mehrjard |
| Solution 3 |
