'New line symbols are not detected by regex
I import data from csv and read it with pandas:
train = pd.read_csv('yelp_review_full_csv/train.csv',
header=None,
names=['Class', 'Review'])
reviews = train['Review']
and willing to get rid of new line symbols - \n using regex:
print(reviews[3])
rex = re.sub("\\n+", " ", reviews[3])
print(rex)
which gives me an output:
... much. \n\nI think ...
... much. \n\nI think ...
If I copy the output and check it with regex, then I have a desired result. I guess there should be something with csv reading, any recommendations?
Solution 1:[1]
Your text contains literal \n in it, not newlines.
The regexp \n matches a newline, not literal \n. To match \n you need to use the regexp \\n. Escaping the backslash just allows the backslash to be passed to the regexp parser. You need to double-escape it so that the regexp will match \n, or use a raw string.
rex = re.sub(r"(\\n)+, " ", reviews[3])
See What exactly is a "raw string regex" and how can you use it?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Barmar |
