'R: How to remove a part of string, which with a specific start and end, in a R dataframe?
I have a data frame like this:
df = data.frame(order = c(1,2,3), info = c("an apple","a banana[12],","456[Ab]"))
I want to clean up to remove the [] and content inside []. So that the result of df$info will be "an apple" "a banana" "456"
Please help...
Solution 1:[1]
Use gsub:
df$info <- gsub("\\[.*?\\]", "", df$info)
Solution 2:[2]
In base R, we can use trimws
df$info <- trimws(df$info, whitespace = "\\[.*")
df$info
[1] "an apple" "a banana" "456"
Solution 3:[3]
1.) This will give the expected output with removing also the comma:
library(dplyr)
library(stringr)
df %>%
mutate(info = str_trim(str_replace_all(info, "(\\[.*\\])\\,?", "")))
order info
1 1 an apple
2 2 a banana
3 3 456
2.) This will remove brackets and their content:
\\[....match [
.*....any following characters
\\]... match ]
library(dplyr)
library(stringr)
df %>%
mutate(info = str_replace_all(info, "\\[.*\\]$", ""))
order info
1 1 an apple
2 2 a banana,
3 3 456
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tim Biegeleisen |
| Solution 2 | akrun |
| Solution 3 |
