'Put spaces around all punctuation but excluding apostrophes

I'm new to this, so I'm sorry if this is a stupid question... I need help with a bit of code in R...

I have a bit of code (below) which puts a space around all my punctuation in all txt files in a folder. It's lovely, but I don't want it to add space around apostrophes (') -

Can anybody help me exclude apostrophes in that bit gsub("(\.+|[[:punct:]])", " \1 " ---? Or is that how you would do it? (with [^ ?)

I get this: "I want : spaces around all these marks ; : ! ? . but i didn ’ t want it there in didn ’ t"

I want this: "I want : spaces around all these marks ; : ! ? . but i didn’t want it there in didn’t"

for(file in filelist){
  tx=readLines(file)
  tx2=gsub("(\\.+|[[:punct:]])", " \\1 ", tx)
  writeLines(tx2, con=file)
}



Solution 1:[1]

You can use

tx <- "I want: spaces around all these marks;:!?.but i didn’t want it there in didn't"
gsub("\\s*(\\.+|[[:punct:]])(?<!\\b['’]\\b)\\s*", " \\1 ", tx, perl=TRUE)
## => [1] "I want : spaces around all these marks ;  :  !  ?  . but i didn’t want it there in didn't"

The perl=TRUE only means that the regex is handled with the PCRE library (note that PCRE regex engine is not the same as Perl regex engine). See the R demo online and the regex demo.

Details:

  • \s* - zero or more whitespaces
  • (\.+|[[:punct:]]) - Group 1 (\1): one or more dots, or a punctuation char
  • (?<!\b['’]\b) - immediately on the left, there must be no ' or enclosed with word chars
  • \s* - zero or more whitespaces

Solution 2:[2]

We may match the ' and SKIP it before matching all other punctuation works

gsub("’(*SKIP)(*FAIL)|([[:punct:].])", " \\1 ", tx, perl = TRUE)

-output

[1] "I want : spaces around all these marks ;  :  !  ?  .  but i didn’t want it there in didn’t"

data

tx <- "I want:spaces around all these marks;:!?. but i didn’t want it there in didn’t"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew
Solution 2