'How to set column value based on string (mis)match between two other columns?

I want to create a match variable in a dataframe that

  • is 1 if the value of another variable (string) is contained in the value of a third variable (string)
  • is 0 if that is not the case
  • and is NA if either of the string variables is NA

So far I have tried (str_contains function from the sjmisc package):

df$match[(df$str1 == "left"  & str_contains(df$str2, "left"))
                  | (df$str1== "right"  & str_contains(df$str2, "right"))] = 1

df$match[(df$str1== "left"  & str_contains(df$str2, "left", logic = "not")) 
                  | (df$str1== "right"  & str_contains(df$str2, "right", logic = "not"))] = 0

df$match[is.na(df$str1)| is.na(df$str2)] = NA

But only the NA part works well, for the rest I get all rows = 1 which isn't right based on the data.

Data example:

str1 str2 match
left right -
right somewhat left -
left very left -
right right -
right somewhat right -

match should be 0,0,1,1,1 in the example, but ends up all 1 instead. I'd be grateful for any suggestions what's wrong here or alternative ways to achieve the result I want!



Solution 1:[1]

A base solution:

within(df, {
  match <- +mapply(grepl, str1, str2)
})

#    str1           str2 match
# 1  left          right     0
# 2 right  somewhat left     0
# 3  left      very left     1
# 4 right          right     1
# 5 right somewhat right     1
# 6  <NA>           <NA>    NA

Data
df <- structure(list(str1 = c("left", "right", "left", "right", "right", 
NA), str2 = c("right", "somewhat left", "very left", "right", 
"somewhat right", NA)), row.names = c(NA, -6L), class = "data.frame")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Darren Tsai