'Replace content in one column with another column R
I am working with some survey data and I would like to replace the contents of one survey item/column with another survey item, while keeping original cell contents. Ex - replace Q2_1.x with Q2_1.y if Q2_1.x is missing (missing coded as "-99" or coded as character_NA).
Here is an example of my data:
ibrary(dplyr)
library(magrittr)
ibrary(readr)
org_dat <- read_table('ID Q2_1.x Q2_2.x Q2_1.y Q2_2.y Q14_1.x Q14_1.y Q15
1 Yes NA NA NA Sometimes NA NA
2 -99 NA No NA NA Always Yes
3 Yes NA Yes NA NA NA NA
4 -99 NA NA No NA Yes No
5 NA -99 NA NA NA Always NA
6 -99 NA NA No NA NA NA') %>% mutate_all(as.character)
Here is my desired output:
dat_out <- read_table('ID Q2_1 Q2_2 Q14_1 Q15
1 Yes NA Sometimes NA
2 No NA Always Yes
3 Yes NA NA NA
4 -99 No Yes No
5 NA -99 Always NA
6 -99 No NA NA')
Current solution I know that I can replace each of these columns individually, but I have a lot of columns to deal with and I would like to use a smart dplyr/grepl way of solving this! Any ideas? It is always the case that I am replacing the Q*.x with the Q*.y.
org_dat %>% mutate(Q2_1.x = case_when(!is.na(Q2_1.y) &
Q2_1.x == '-99'| is.na(Q2_1.x) ~ Q2_1.y,
TRUE ~ Q2_1.x)) %>%
mutate(Q2_2.x = case_when(!is.na(Q2_2.y) &
Q2_2.x == '-99'| is.na(Q2_2.x) ~ Q2_2.y,
TRUE ~ Q2_2.x)) %>%
mutate(Q14_1.x = case_when(!is.na(Q14_1.y) &
Q14_1.x == '-99'| is.na(Q14_1.x) ~ Q14_1.y,
TRUE ~ Q14_1.x)) %>%
rename(Q2_1 = Q2_1.x,
Q2_2 = Q2_2.x,
Q14_1 = Q14_1.x) %>%
select(-matches("x|y"))
Solution 1:[1]
The key to the answer here is to first translate the user-defined NAs into real nas with na_if, followed by coalesce with paired columns.
library(dplyr)
library(stringr)
org_dat %>%
na_if(-99) %>%
mutate(across(ends_with('.x'),
~coalesce(.x, get(deparse(substitute(.x)) %>%
str_replace('\\.x', '.y'))))) %>%
select(-ends_with('.y')) %>%
rename_with(~str_remove(.x, '\\..$'))
# A tibble: 6 × 5
ID Q2_1 Q2_2 Q14_1 Q15
<chr> <chr> <chr> <chr> <chr>
1 1 Yes NA Sometimes NA
2 2 No NA Always Yes
3 3 Yes NA NA NA
4 4 NA No Yes No
5 5 NA NA Always NA
6 6 NA No NA NA
EDIT
The original answer did not provide the actual desired output, because it replaced all user-defined NAs (-99) with NAs.
If the OP wants to preserve these user defined NAs, We can do as follows:
First, change all columns to character.
Second, split the data.frame into dataframes paired by the prefix "Q{number}_{number}" with split.default.
finally, modify all list elements with two columns ('x' and 'y' pairs) with modify_if and coalesce.
library(dplyr)
library(purrr)
org_dat %>%
mutate(across(everything(), as.character)) %>%
split.default(sub('\\..$', '', names(org_dat))) %>%
modify_if(.p=~ncol(.x)==2, .f = ~coalesce(.x[[1]], .x[[2]])) %>%
bind_cols() %>%
select(ID, Q2_1, Q2_2, Q14_1, Q15)
# A tibble: 6 × 5
ID Q2_1 Q2_2 Q14_1 Q15
<chr> <chr> <chr> <chr> <chr>
1 1 Yes NA Sometimes NA
2 2 -99 NA Always Yes
3 3 Yes NA NA NA
4 4 -99 No Yes No
5 5 NA -99 Always NA
6 6 -99 No NA NA
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
