'Extract text immediately before double colons

I have a string like this:

text <- "This is some text::stuff. Look, there's some::more. And here::is some more."

I would like to extract the words before the double colons. To do this, I use gregexpr to match for alpha-numerics immediately before double colons:

m <- gregexpr("[[:alnum:]]*::", text)

Then, I call regmatches to pull out this text, unlist the result to a vector, and finally strip out the double colons with gsub.

gsub("::", "", unlist(regmatches(text, m)))
#[1] "text" "some" "here"

This is the desired result, but relies on four function calls. Is there a more efficient way of achieving the same result?



Solution 1:[1]

You can use lookahead and str_extract_all to do it all in one go:

library(stringr)
str_extract_all(text, "\\w+(?=::)")[[1]]
[1] "text" "some" "here"

Solution 2:[2]

You can use

m <- gregexpr("[[:alnum:]]+(?=::)", text, perl=TRUE)

See the regex demo. Here, [[:alnum:]]+(?=::) matches one or more letters or digits and then checks if they are immediately followed with two colons without consuming the colons, since the (?=...) is a non-consuming lookahead construct.

Mind the perl=TRUE argument becomes obligatory here since the default TRE regex engine does not allow lookaround use. perl=TRUE enables the PCRE regex engine, and it allows both lookbehinds and lookaheads.

See an R demo:

text <- "This is some text::stuff. Look, there's some::more. And here::is some more."
m <- gregexpr("[[:alnum:]]+(?=::)", text, perl=TRUE)
unlist(regmatches(text, m))
## => [1] "text" "some" "here"

Solution 3:[3]

You could also use a capture group instead of lookarounds, and repeat [[:alnum:]]+ 1 or more times to prevent matching empty strings

library(stringr)

text <- "This is some text::stuff. Look, there's some::more. And here::is some more."
str_match_all(text, "([[:alnum:]]+)::")[[1]][,2]

Output

[1] "text" "some" "here"

See an R demo

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Chris Ruehlemann
Solution 2 Wiktor Stribiżew
Solution 3 The fourth bird