'Replace content in one column with another column R

I am working with some survey data and I would like to replace the contents of one survey item/column with another survey item, while keeping original cell contents. Ex - replace Q2_1.x with Q2_1.y if Q2_1.x is missing (missing coded as "-99" or coded as character_NA).

Here is an example of my data:

ibrary(dplyr)
library(magrittr)
ibrary(readr)

org_dat <- read_table('ID   Q2_1.x  Q2_2.x  Q2_1.y  Q2_2.y  Q14_1.x Q14_1.y Q15
    1   Yes NA  NA  NA  Sometimes   NA  NA
    2   -99 NA  No  NA  NA  Always  Yes
    3   Yes NA  Yes NA  NA  NA  NA
    4   -99 NA  NA  No  NA  Yes No 
    5   NA  -99 NA  NA  NA  Always  NA
    6   -99 NA  NA  No  NA  NA  NA') %>% mutate_all(as.character)

Here is my desired output:

dat_out <- read_table('ID   Q2_1    Q2_2    Q14_1   Q15
1   Yes NA  Sometimes   NA
2   No  NA  Always  Yes
3   Yes NA  NA  NA
4   -99 No  Yes No
5   NA  -99 Always  NA
6   -99 No  NA  NA')

Current solution I know that I can replace each of these columns individually, but I have a lot of columns to deal with and I would like to use a smart dplyr/grepl way of solving this! Any ideas? It is always the case that I am replacing the Q*.x with the Q*.y.

org_dat %>% mutate(Q2_1.x = case_when(!is.na(Q2_1.y) &
                                        Q2_1.x == '-99'| is.na(Q2_1.x) ~ Q2_1.y,
                                      TRUE ~ Q2_1.x)) %>%
mutate(Q2_2.x = case_when(!is.na(Q2_2.y) &
                            Q2_2.x == '-99'| is.na(Q2_2.x) ~ Q2_2.y,
                          TRUE ~ Q2_2.x)) %>% 
  
  mutate(Q14_1.x = case_when(!is.na(Q14_1.y) &
                              Q14_1.x == '-99'| is.na(Q14_1.x) ~ Q14_1.y,
                            TRUE ~ Q14_1.x)) %>%
  rename(Q2_1 = Q2_1.x,
         Q2_2 = Q2_2.x,
         Q14_1 = Q14_1.x) %>%
  select(-matches("x|y"))


Solution 1:[1]

The key to the answer here is to first translate the user-defined NAs into real nas with na_if, followed by coalesce with paired columns.

library(dplyr)
library(stringr)
org_dat %>%
    na_if(-99) %>%
    mutate(across(ends_with('.x'),
                  ~coalesce(.x, get(deparse(substitute(.x)) %>%
                                        str_replace('\\.x', '.y'))))) %>%
    select(-ends_with('.y')) %>%
    rename_with(~str_remove(.x, '\\..$'))


# A tibble: 6 × 5
  ID    Q2_1  Q2_2  Q14_1     Q15  
  <chr> <chr> <chr> <chr>     <chr>
1 1     Yes   NA    Sometimes NA   
2 2     No    NA    Always    Yes  
3 3     Yes   NA    NA        NA   
4 4     NA    No    Yes       No   
5 5     NA    NA    Always    NA   
6 6     NA    No    NA        NA   

EDIT

The original answer did not provide the actual desired output, because it replaced all user-defined NAs (-99) with NAs.

If the OP wants to preserve these user defined NAs, We can do as follows: First, change all columns to character. Second, split the data.frame into dataframes paired by the prefix "Q{number}_{number}" with split.default. finally, modify all list elements with two columns ('x' and 'y' pairs) with modify_if and coalesce.

library(dplyr)
library(purrr)

org_dat %>%
    mutate(across(everything(), as.character)) %>%
        split.default(sub('\\..$', '', names(org_dat))) %>%
        modify_if(.p=~ncol(.x)==2, .f = ~coalesce(.x[[1]], .x[[2]])) %>%
        bind_cols() %>%
        select(ID, Q2_1, Q2_2, Q14_1, Q15)

# A tibble: 6 × 5
  ID    Q2_1  Q2_2  Q14_1     Q15  
  <chr> <chr> <chr> <chr>     <chr>
1 1     Yes   NA    Sometimes NA   
2 2     -99   NA    Always    Yes  
3 3     Yes   NA    NA        NA   
4 4     -99   No    Yes       No   
5 5     NA    -99   Always    NA   
6 6     -99   No    NA        NA   

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1