'How to fill one data frame with data from another while retaining NAs from the first

I have 2 data frames with the same column names, but different numbers of rows. The first data frame (a) looks similar to this:

a = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14), 
               "v1"=c(0,0,0,0,0,0,0,0,0,0), 
               "v2"=c(0,0,0,0,NA,NA,NA,0,0,0), 
               "v3"=c(0,0,0,NA,0,NA,0,0,0,0), 
               "v4"=c(0,0,0,0,0,0,0,0,NA,NA), 
               "v5"=c(0,0,0,0,0,NA,0,NA,0,0)) 

Note: sites 5, 6, 8, and 12 are missing purposefully.

The second data frame (b) looks something like this:

b = data.frame("Site"=c(2,3,4,7,10,14),
               "v1"=c(1,NA,2,1,NA,NA),
               "v2"=c(1,1,NA,NA,NA,NA),
               "v3"=c(NA,1,NA,NA,NA,1),
               "v4"=c(1,NA,4,1,NA,NA),
               "v5"=c(1,NA,2,1,1,3))

What I want to achieve is this:

desired = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14), 
                     "v1"=c(0,1,0,2,1,0,0,0,0,0), 
                     "v2"=c(0,1,1,0,NA,NA,NA,0,0,0), 
                     "v3"=c(0,0,1,NA,0,NA,0,0,0,1), 
                     "v4"=c(0,1,0,4,1,0,0,0,NA,NA), 
                     "v5"=c(0,1,0,2,1,NA,1,NA,0,3))

Where I "inject" (I'm sure there's a better term) the data from data frame b into data frame a, however I'd like to replace any NAs from b with zeros and keep the NAs from a as they are.

I found and have tried this code:

cols <- colnames(a)[colnames(a) %in% colnames(b)]
rows <- rownames(a)[rownames(a) %in% rownames(b)]

a[rows, cols] <- b[rows, cols]

But it brings the NAs along with it. I considered replacing the NAs with zeros first, but even then it would erase the NAs I currently have in data frame a that I want to keep.

Perhaps a for loop or something in tidyverse is the way to go, but I don't even know where to begin with those. Any help would be much appreciated!



Solution 1:[1]

merge(b, a, by = 'Site', all = TRUE) %>%
  split.default(sub('.x|.y', '', names(.))) %>%
  map_df(~coalesce(!!!.x))

# A tibble: 10 x 6
    Site    v1    v2    v3    v4    v5
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     0     0     0     0     0
 2     2     1     1     0     1     1
 3     3     0     1     1     0     0
 4     4     2     0    NA     4     2
 5     7     1    NA     0     1     1
 6     9     0    NA    NA     0    NA
 7    10     0    NA     0     0     1
 8    11     0     0     0     0    NA
 9    13     0     0     0    NA     0
10    14     0     0     1    NA     3

Solution 2:[2]

I suggest you first replace every NA values in b with 0 and then use inner_join to merge the result with the corresponding Site values in a. You could then replace non-NA values of a with their corresponding values in b leaving NA values in a intact. In the end we bind the modified data frame with a subset of a whose Site values are not present in b.

library(dplyr)

a %>%
  inner_join(b %>%
               mutate(across(!Site, ~ replace(.x, is.na(.x), 0))), 
             by = "Site") %>%
  mutate(across(ends_with(".x"), ~ ifelse(!is.na(.x), get(gsub("(.*\\.)x", "\\1y", cur_column())), 
                                          .x))) %>%
  select(!ends_with("y")) %>%
  rename_with(~ gsub("(.*)\\.x", "\\1", .), ends_with(".x")) %>%
  bind_rows(a %>% 
              filter(!Site %in% unique(b$Site))) %>%
  arrange(Site)


   Site v1 v2 v3 v4 v5
1     1  0  0  0  0  0
2     2  1  1  0  1  1
3     3  0  1  1  0  0
4     4  2  0 NA  4  2
5     7  1 NA  0  1  1
6     9  0 NA NA  0 NA
7    10  0 NA  0  0  1
8    11  0  0  0  0 NA
9    13  0  0  0 NA  0
10   14  0  0  1 NA  3

Also a brilliant and concise solution suggested by my brilliant friend Onyambu:

rbind(a, b) %>% 
  group_by(Site) %>%  
  summarise(across(everything(), ~ 
                     if(any(!is.na(.x))) max(.x, na.rm = TRUE) else NA))

Solution 3:[3]

i <- match(b$Site, a$Site)
a_nas <- is.na(a)
for (j in seq(2, ncol(a))) {
  a[i, j] <- ifelse(is.na(b[[j]]), 0, b[[j]])
}
a[a_nas] <- NA

all.equal(desired, a)
# [1] TRUE

Solution 4:[4]

We can use {powerjoin}

library(powerjoin)
power_full_join(a, b, by = "Site", conflict = ~ifelse(is.na(.x), NA, coalesce_yx(.x, .y)))
#>    Site v1 v2 v3 v4 v5
#> 1     1  0  0  0  0  0
#> 2     2  1  1  0  1  1
#> 3     3  0  1  1  0  0
#> 4     4  2  0 NA  4  2
#> 5     7  1 NA  0  1  1
#> 6     9  0 NA NA  0 NA
#> 7    10  0 NA  0  0  1
#> 8    11  0  0  0  0 NA
#> 9    13  0  0  0 NA  0
#> 10   14  0  0  1 NA  3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 onyambu
Solution 2
Solution 3 IceCreamToucan
Solution 4 moodymudskipper