'Remove punctuation from text (except the symbol &)

I need to remove punctuation from the text:

 data <- "Type the command AT&W enter. in order to save the new protocol on modem;"
 gsub('[[:punct:] ]+',' ',data)

This solution gives the result

[1] "Type the command AT W enter in order to save the new protocol on modem "

This is not the desired result because I would like to save &, hence:

[1] "Type the command AT&W enter in order to save the new protocol on modem "


Solution 1:[1]

What about doing the inverse? i.e. replacing everything that is not a letter, a digit or a & with an empty string:

gsub("[^[:alnum:][:space:]&]", "", data)
# [1] "Type the command AT&W enter in order to save the new protocol on modem"

Solution 2:[2]

You could try a user defined regex consisting of anything that is not an $ or an alpha numeric:

data <- "Type the command AT&W enter. in order to save the new protocol on modem;"

gsub('[^&[:alnum:] ]+',' ',data)

Solution 3:[3]

Here's another regex, which literally means "find all punctuations except &".

gsub("[^\\P{P}&]", "", data, perl = T)
[1] "Type the command AT&W enter in order to save the new protocol on modem"

Solution 4:[4]

Another possible solution, based on stringr:

library(stringr)
 
str_remove_all(data, "(?!&)[[:punct:]]")

#> [1] "Type the command AT&W enter in order to save the new protocol on modem"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 MatthewR
Solution 3 benson23
Solution 4