'How to set column value based on string (mis)match between two other columns?
I want to create a match variable in a dataframe that
- is 1 if the value of another variable (string) is contained in the value of a third variable (string)
- is 0 if that is not the case
- and is NA if either of the string variables is NA
So far I have tried (str_contains function from the sjmisc package):
df$match[(df$str1 == "left" & str_contains(df$str2, "left"))
| (df$str1== "right" & str_contains(df$str2, "right"))] = 1
df$match[(df$str1== "left" & str_contains(df$str2, "left", logic = "not"))
| (df$str1== "right" & str_contains(df$str2, "right", logic = "not"))] = 0
df$match[is.na(df$str1)| is.na(df$str2)] = NA
But only the NA part works well, for the rest I get all rows = 1 which isn't right based on the data.
Data example:
| str1 | str2 | match |
|---|---|---|
| left | right | - |
| right | somewhat left | - |
| left | very left | - |
| right | right | - |
| right | somewhat right | - |
match should be 0,0,1,1,1 in the example, but ends up all 1 instead. I'd be grateful for any suggestions what's wrong here or alternative ways to achieve the result I want!
Solution 1:[1]
A base solution:
within(df, {
match <- +mapply(grepl, str1, str2)
})
# str1 str2 match
# 1 left right 0
# 2 right somewhat left 0
# 3 left very left 1
# 4 right right 1
# 5 right somewhat right 1
# 6 <NA> <NA> NA
Data
df <- structure(list(str1 = c("left", "right", "left", "right", "right",
NA), str2 = c("right", "somewhat left", "very left", "right",
"somewhat right", NA)), row.names = c(NA, -6L), class = "data.frame")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Darren Tsai |
