'Assign another column when text matching
I want to assign another column when a keyword matches a word in the text, assign the value one. If multiple of the same word is in the text, only assign the maximum value of 1 or 0 otherwise.
Let's say I have this dataset:
df = structure(list(text = c("I hate good cheese", "cheese that smells is the best",
"isn't it obvious that green cheese serves you well",
"don't fight it just eat the cheese", "the last good cheese is down"),
stuff = c(3, 2, 40, 4, 5) ), row.names = c(NA, 5L),
class = c("tbl_df", "tbl", "data.frame"))
with the following keywords to search for:
keywords = structure(list(keyword_one = c("cheese", "blue", "best"),
keyword_two = c("smells", "final", 'south')
),
row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame"))
I can do the following:
df[str_detect(df$text, keywords$keyword_one),]
to return the rows where the keyword matches but how do I just get all the rows but assign the value one when it matches? so something like:
# A tibble: 5 × 2
text stuff keyword1 keyword 2
* <chr> <dbl>
1 I hate good cheese 3 1 0
2 cheese that smells is the best 2 0 1
3 isn't it obvious that green cheese serves you well 40 0 0
4 don't fight it just eat the cheese 4 1 0
5 the last good cheese is down 5 0 0
Alternatively, I found that I could do:
ifelse(str_detect(df$text, keywords$keyword_one), 1, 0)
ifelse(str_detect(df$text, keywords$keyword_two), 1, 0)
However it's inefficient if I have many columns in keyword and wanted to iterate over all of these.
Furthermore, I have noticed that str_detect seems to not detect the word cheese in all of the texts` why is this happening?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
