'How can I decode a column with text from another column in R?
I have a dataframe with encoded survey answers in the answer column und the keys as one string in a character column:
df <- data.frame(answer = c(1, 2, 1, 3, 1),
key = c("1 = Answer One 2 = Answer Two 3 = Answer Three", "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI",
"1 = Answer abc 2 = Answer def 3 = Answer ghi", "1 = Answer One 2 = Answer Two 3 = Answer Three",
"1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"))
print(df)
answer key
1 1 "1 = Answer One 2 = Answer Two 3 = Answer Three"
2 2 "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"
3 1 "1 = Answer abc 2 = Answer def 3 = Answer ghi"
4 3 "1 = Answer One 2 = Answer Two 3 = Answer Three"
5 1 "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"
How can I decode the answer column with the data from the key column so that I get this result?
df_result <- data.frame(answer = c(1, 2, 1, 3, 1),
key = c("1 = Answer One 2 = Answer Two 3 = Answer Three", "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI",
"1 = Answer abc 2 = Answer def 3 = Answer ghi", "1 = Answer One 2 = Answer Two 3 = Answer Three",
"1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"),
answer_decoded = c("Answer One", "Answer DEF", "Answer abc", "Answer Three","Answer ABC"))
print(df_result)
answer key answer_decoded
1 1 "1 = Answer One 2 = Answer Two 3 = Answer Three" "Answer One"
2 2 "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI" "Answer DEF"
3 1 "1 = Answer abc 2 = Answer def 3 = Answer ghi" "Answer abc"
4 3 "1 = Answer One 2 = Answer Two 3 = Answer Three" "Answer Three"
5 1 "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI" "Answer ABC"
I cannot use factor labels since I have too many different items to manually create them.
Solution 1:[1]
We may extract the substring based on the 'answer' values - use str_c to create the pattern to be extracted i.e. paste the 'answer' with space followed by = and one or more non-digit characters (\\D+) and remove the prefix part including the = and any spaces with trimws
library(stringr)
library(dplyr)
df %>%
mutate(answer_decoded = trimws(str_extract(key,
str_c(answer, ' = \\D+')), whitespace = ".*=\\s+|\\s+"))
-output
answer key answer_decoded
1 1 1 = Answer One 2 = Answer Two 3 = Answer Three Answer One
2 2 1 = Answer ABC 2 = Answer DEF 3 = Answer GHI Answer DEF
3 1 1 = Answer abc 2 = Answer def 3 = Answer ghi Answer abc
4 3 1 = Answer One 2 = Answer Two 3 = Answer Three Answer Three
5 1 1 = Answer ABC 2 = Answer DEF 3 = Answer GHI Answer ABC
Solution 2:[2]
strsplit each string on the N = bit, then select [ the nth string (+1 because of the way the split works):
mapply(`[`, strsplit(df$key, "(\\s*)\\d = "), df$answer + 1)
#[1] "Answer One" "Answer DEF" "Answer abc" "Answer Three" "Answer ABC"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | akrun |
| Solution 2 | thelatemail |
