'Extract text immediately before double colons
I have a string like this:
text <- "This is some text::stuff. Look, there's some::more. And here::is some more."
I would like to extract the words before the double colons. To do this, I use gregexpr to match for alpha-numerics immediately before double colons:
m <- gregexpr("[[:alnum:]]*::", text)
Then, I call regmatches to pull out this text, unlist the result to a vector, and finally strip out the double colons with gsub.
gsub("::", "", unlist(regmatches(text, m)))
#[1] "text" "some" "here"
This is the desired result, but relies on four function calls. Is there a more efficient way of achieving the same result?
Solution 1:[1]
You can use lookahead and str_extract_all to do it all in one go:
library(stringr)
str_extract_all(text, "\\w+(?=::)")[[1]]
[1] "text" "some" "here"
Solution 2:[2]
You can use
m <- gregexpr("[[:alnum:]]+(?=::)", text, perl=TRUE)
See the regex demo. Here, [[:alnum:]]+(?=::) matches one or more letters or digits and then checks if they are immediately followed with two colons without consuming the colons, since the (?=...) is a non-consuming lookahead construct.
Mind the perl=TRUE argument becomes obligatory here since the default TRE regex engine does not allow lookaround use. perl=TRUE enables the PCRE regex engine, and it allows both lookbehinds and lookaheads.
See an R demo:
text <- "This is some text::stuff. Look, there's some::more. And here::is some more."
m <- gregexpr("[[:alnum:]]+(?=::)", text, perl=TRUE)
unlist(regmatches(text, m))
## => [1] "text" "some" "here"
Solution 3:[3]
You could also use a capture group instead of lookarounds, and repeat [[:alnum:]]+ 1 or more times to prevent matching empty strings
library(stringr)
text <- "This is some text::stuff. Look, there's some::more. And here::is some more."
str_match_all(text, "([[:alnum:]]+)::")[[1]][,2]
Output
[1] "text" "some" "here"
See an R demo
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Chris Ruehlemann |
| Solution 2 | Wiktor Stribiżew |
| Solution 3 | The fourth bird |
