'Extract emails in brackets
I work with gmailR and I need to extract emails from brackets <> (sometimes few in one row) but in case when there are no brackets (e.g. [email protected]) I need to keep those elements.
This is an example
x2 <- c("John Smith <[email protected]> <[email protected]>","[email protected]" ,
"<[email protected]>")
I need output like:
[1] "[email protected]" "[email protected]"
[2] "[email protected]"
[3] "[email protected]"
I tried this in purpose to merge that 2 results
library("qdapRegex")
y1 <- ex_between(x2, "<", ">", extract = FALSE)
y2 <- rm_between(x2, "<", ">", extract = TRUE )
My data code sample:
from <- sapply(msgs_meta, gm_from)
from[sapply(from, is.null)] <- NA
from1 <- rm_bracket(from)
from2 <- ex_bracket(from)
gmail_DK <- gmail_DK %>%
mutate(from = unlist(y1)) %>%
mutate(from = unlist(y2))
but when I use this function to my data (only one day emails) and unlist I get
Error in
mutate(): ! Problem while computingcc = unlist(cc2). xccmust be size 103 or 1, not 104. Runrlang::last_error()to see where the error occurred.
I suppose that in data from more days difference should be bigger, so I prefer to not go this way.
Preferred answer in R but if you know how to make it in for example PowerQuery should be great too.
Solution 1:[1]
We may also use base R - split the strings at the space that follows the > (strsplit) and then capture the substring between the < and > in sub (in the replacement, we specify the backreference (\\1) of the captured group) - [^>]+ - implies one or more characters that are not a >
sub(".*<([^>]+)>", "\\1", unlist(strsplit(x2,
"(?<=>)\\s+", perl = TRUE)))
[1] "[email protected]" "[email protected]"
[3] "[email protected]" "[email protected]"
Solution 2:[2]
Clunky but OK?
(x2
## split into single words/tokens
%>% strsplit(" ")
%>% unlist()
## find e-mail-like strings, with or without brackets
%>% stringr::str_extract("<?[\\w-.]+@[\\w-.]+>?")
## drop elements with no e-mail component
%>% na.omit()
## strip brackets
%>% stringr::str_remove_all("[<>]")
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | akrun |
| Solution 2 | Ben Bolker |
