'Put spaces around all punctuation but excluding apostrophes
I'm new to this, so I'm sorry if this is a stupid question... I need help with a bit of code in R...
I have a bit of code (below) which puts a space around all my punctuation in all txt files in a folder. It's lovely, but I don't want it to add space around apostrophes (') -
Can anybody help me exclude apostrophes in that bit gsub("(\.+|[[:punct:]])", " \1 " ---? Or is that how you would do it? (with [^ ?)
I get this: "I want : spaces around all these marks ; : ! ? . but i didn ’ t want it there in didn ’ t"
I want this: "I want : spaces around all these marks ; : ! ? . but i didn’t want it there in didn’t"
for(file in filelist){
tx=readLines(file)
tx2=gsub("(\\.+|[[:punct:]])", " \\1 ", tx)
writeLines(tx2, con=file)
}
Solution 1:[1]
You can use
tx <- "I want: spaces around all these marks;:!?.but i didn’t want it there in didn't"
gsub("\\s*(\\.+|[[:punct:]])(?<!\\b['’]\\b)\\s*", " \\1 ", tx, perl=TRUE)
## => [1] "I want : spaces around all these marks ; : ! ? . but i didn’t want it there in didn't"
The perl=TRUE only means that the regex is handled with the PCRE library (note that PCRE regex engine is not the same as Perl regex engine).
See the R demo online and the regex demo.
Details:
\s*- zero or more whitespaces(\.+|[[:punct:]])- Group 1 (\1): one or more dots, or a punctuation char(?<!\b['’]\b)- immediately on the left, there must be no'or’enclosed with word chars\s*- zero or more whitespaces
Solution 2:[2]
We may match the ' and SKIP it before matching all other punctuation works
gsub("’(*SKIP)(*FAIL)|([[:punct:].])", " \\1 ", tx, perl = TRUE)
-output
[1] "I want : spaces around all these marks ; : ! ? . but i didn’t want it there in didn’t"
data
tx <- "I want:spaces around all these marks;:!?. but i didn’t want it there in didn’t"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
| Solution 2 |
