'How to fill one data frame with data from another while retaining NAs from the first
I have 2 data frames with the same column names, but different numbers of rows. The first data frame (a) looks similar to this:
a = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14),
"v1"=c(0,0,0,0,0,0,0,0,0,0),
"v2"=c(0,0,0,0,NA,NA,NA,0,0,0),
"v3"=c(0,0,0,NA,0,NA,0,0,0,0),
"v4"=c(0,0,0,0,0,0,0,0,NA,NA),
"v5"=c(0,0,0,0,0,NA,0,NA,0,0))
Note: sites 5, 6, 8, and 12 are missing purposefully.
The second data frame (b) looks something like this:
b = data.frame("Site"=c(2,3,4,7,10,14),
"v1"=c(1,NA,2,1,NA,NA),
"v2"=c(1,1,NA,NA,NA,NA),
"v3"=c(NA,1,NA,NA,NA,1),
"v4"=c(1,NA,4,1,NA,NA),
"v5"=c(1,NA,2,1,1,3))
What I want to achieve is this:
desired = data.frame("Site"=c(1,2,3,4,7,9,10,11,13,14),
"v1"=c(0,1,0,2,1,0,0,0,0,0),
"v2"=c(0,1,1,0,NA,NA,NA,0,0,0),
"v3"=c(0,0,1,NA,0,NA,0,0,0,1),
"v4"=c(0,1,0,4,1,0,0,0,NA,NA),
"v5"=c(0,1,0,2,1,NA,1,NA,0,3))
Where I "inject" (I'm sure there's a better term) the data from data frame b into data frame a, however I'd like to replace any NAs from b with zeros and keep the NAs from a as they are.
I found and have tried this code:
cols <- colnames(a)[colnames(a) %in% colnames(b)]
rows <- rownames(a)[rownames(a) %in% rownames(b)]
a[rows, cols] <- b[rows, cols]
But it brings the NAs along with it. I considered replacing the NAs with zeros first, but even then it would erase the NAs I currently have in data frame a that I want to keep.
Perhaps a for loop or something in tidyverse is the way to go, but I don't even know where to begin with those. Any help would be much appreciated!
Solution 1:[1]
merge(b, a, by = 'Site', all = TRUE) %>%
split.default(sub('.x|.y', '', names(.))) %>%
map_df(~coalesce(!!!.x))
# A tibble: 10 x 6
Site v1 v2 v3 v4 v5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0 0
2 2 1 1 0 1 1
3 3 0 1 1 0 0
4 4 2 0 NA 4 2
5 7 1 NA 0 1 1
6 9 0 NA NA 0 NA
7 10 0 NA 0 0 1
8 11 0 0 0 0 NA
9 13 0 0 0 NA 0
10 14 0 0 1 NA 3
Solution 2:[2]
I suggest you first replace every NA values in b with 0 and then use inner_join to merge the result with the corresponding Site values in a. You could then replace non-NA values of a with their corresponding values in b leaving NA values in a intact. In the end we bind the modified data frame with a subset of a whose Site values are not present in b.
library(dplyr)
a %>%
inner_join(b %>%
mutate(across(!Site, ~ replace(.x, is.na(.x), 0))),
by = "Site") %>%
mutate(across(ends_with(".x"), ~ ifelse(!is.na(.x), get(gsub("(.*\\.)x", "\\1y", cur_column())),
.x))) %>%
select(!ends_with("y")) %>%
rename_with(~ gsub("(.*)\\.x", "\\1", .), ends_with(".x")) %>%
bind_rows(a %>%
filter(!Site %in% unique(b$Site))) %>%
arrange(Site)
Site v1 v2 v3 v4 v5
1 1 0 0 0 0 0
2 2 1 1 0 1 1
3 3 0 1 1 0 0
4 4 2 0 NA 4 2
5 7 1 NA 0 1 1
6 9 0 NA NA 0 NA
7 10 0 NA 0 0 1
8 11 0 0 0 0 NA
9 13 0 0 0 NA 0
10 14 0 0 1 NA 3
Also a brilliant and concise solution suggested by my brilliant friend Onyambu:
rbind(a, b) %>%
group_by(Site) %>%
summarise(across(everything(), ~
if(any(!is.na(.x))) max(.x, na.rm = TRUE) else NA))
Solution 3:[3]
i <- match(b$Site, a$Site)
a_nas <- is.na(a)
for (j in seq(2, ncol(a))) {
a[i, j] <- ifelse(is.na(b[[j]]), 0, b[[j]])
}
a[a_nas] <- NA
all.equal(desired, a)
# [1] TRUE
Solution 4:[4]
We can use {powerjoin}
library(powerjoin)
power_full_join(a, b, by = "Site", conflict = ~ifelse(is.na(.x), NA, coalesce_yx(.x, .y)))
#> Site v1 v2 v3 v4 v5
#> 1 1 0 0 0 0 0
#> 2 2 1 1 0 1 1
#> 3 3 0 1 1 0 0
#> 4 4 2 0 NA 4 2
#> 5 7 1 NA 0 1 1
#> 6 9 0 NA NA 0 NA
#> 7 10 0 NA 0 0 1
#> 8 11 0 0 0 0 NA
#> 9 13 0 0 0 NA 0
#> 10 14 0 0 1 NA 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | onyambu |
| Solution 2 | |
| Solution 3 | IceCreamToucan |
| Solution 4 | moodymudskipper |
