'gsub: How to extract words between two words

I know a lot of people have already posted some issues related to mine, but I couldn't found the correct solution.

I have a lot of sentences like: "Therapie: I like the elephants so much Indication"

I want to extract all the words between "Therapie:" and "Indication" in the provided example above would it be "I like the elephants so much".

When I use my code I get always the next 3 words back. What am I doing wrong?

my_df <- c("Therapie: I like the elephants so much Indication")

These are sentences out of the documents and I need just all the words between "Therapie: and Indikation:"

Examples: 
 ____________________________________________________________________________ _____    Diagnose:   Blepharochalasis    Therapie:   Oberlidplastik und Fettresektion mediales und nasales Pocket   Indikation: 

  ____________________________________________________________________________ _____    Diagnose:   Mammahypoplasie    Therapie:   Dual Plane Augmentation bds. über IMF Schnitt  Indikation: 



exc <- sub(".*?\\bTherapie\\W+(\\w+(?:\\W+\\w+){0,2}).*", "\\1", my_df, to = "documents")`, perl=TRUE)

r regex string

Solution 1:^[1]

With str_match. \\s* allows to trim whitespace.

str <- "Therapie: I like the elephants so much Indication"

library(stringr)
str_match(str, "Therapie:\\s*(.*?)\\s*Indication")[, 2]
# [1] "I like the elephants so much"

What about a custom function?

str_between <- function(str, w1, w2){
  stringr::str_match(str, paste0(w1, "\\s*(.*?)\\s*", w2))[, 2]
}

str_between(str, "Therapie:", "Indication")
# [1] "I like the elephants so much"

Solution 2:^[2]

You can do

my_df <- c("Therapie: I like the elephants so much Indication")
sub("^Therapie: (.*) Indication$", "\\1", my_df)
#> [1] "I like the elephants so much"

Solution 3:^[3]

An option with trimws from base R

trimws(str, whitespace = ".*:\\s+|\\s+Indication.*")
[1] "I like the elephants so much"

data

str <- "Therapie: I like the elephants so much Indication"

Solution 4:^[4]

Another way using strsplit:

str <- "Therapie: I like the elephants so much Indication"

!strsplit(str, " ")[[1]] %in% c("Therapie:", "Indication") -> x
paste0(strsplit(str, " ")[[1]][x], collapse = ' ')
#"I like the elephants so much"

Solution 5:^[5]

Another option with a match only:

str <- "Therapie: I like the elephants so much Indication"
regmatches(str, regexpr("\\bTherapie:\\h*\\K.*?(?=\\h*\\bIndication\\b)", str, perl=TRUE))

Output

[1] "I like the elephants so much"

The pattern matches:

\bTherapie: A word boundary to prevent matching a partial word, match the word Therapie and :
\h*\K Match optional spaces and clear clear what is matched so far
.*? Match as least as possible
(?=\h*\bIndication\b) Positive lookahead, assert optional spaces and the word Indication to the right

See an R demo.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2
Solution 3	akrun
Solution 4	AlexB
Solution 5	The fourth bird

'gsub: How to extract words between two words

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]