'Extract text after first upper case or space
How can I extract all text after first space in a column where data is something like this
structure(list(value = c("1.1.a Blue sea", "1.2.a Red ball")), row.names = c(NA, -2L), class =c("tbl_df", "tbl", "data.frame"))
so I get a new column with just
Blue sea
Red ball
Solution 1:[1]
You can use the following code to select all text after the first white space:
sub("^\\S+\\s+", '', df$value)
Output:
[1] "Blue sea" "Red ball"
You can just use this to create it as a new column:
library(dplyr)
df %>%
mutate(new_value = sub("^\\S+\\s+", '', value))
Output:
# A tibble: 2 × 2
value new_value
<chr> <chr>
1 1.1.a Blue sea Blue sea
2 1.2.a Red ball Red ball
Solution 2:[2]
You can use str_extract
from the package stringr
to extract anything that starts with an upper case letter ([[:upper:]]
) followed by one or more characters (.+
) until the end of a string ($
).
library(stringr)
str_extract(df$value, "[[:upper:]].+$")
If you don't want to use regex, you can use str_split
to split strings into two parts by an empty space.
str_split(df$value, " ", n = 2, simplify = T)[,2]
Output
The above two methods have the same output:
[1] "Blue sea" "Red ball"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Quinten |
Solution 2 |