'R: How to remove a part of string, which with a specific start and end, in a R dataframe?

I have a data frame like this:

df = data.frame(order = c(1,2,3), info = c("an apple","a banana[12],","456[Ab]"))

I want to clean up to remove the [] and content inside []. So that the result of df$info will be "an apple" "a banana" "456"

Please help...

r


Solution 1:[1]

Use gsub:

df$info <- gsub("\\[.*?\\]", "", df$info)

Solution 2:[2]

In base R, we can use trimws

df$info <- trimws(df$info, whitespace = "\\[.*")
df$info
[1] "an apple" "a banana" "456"     

Solution 3:[3]

1.) This will give the expected output with removing also the comma:

library(dplyr)
library(stringr)

df %>% 
  mutate(info = str_trim(str_replace_all(info, "(\\[.*\\])\\,?", "")))
  order     info
1     1 an apple
2     2 a banana
3     3      456

2.) This will remove brackets and their content:

\\[....match [

.*....any following characters

\\]... match ]

library(dplyr)
library(stringr)

df %>% 
  mutate(info = str_replace_all(info, "\\[.*\\]$", ""))
  order      info
1     1  an apple
2     2 a banana,
3     3       456

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Biegeleisen
Solution 2 akrun
Solution 3