'R: Missing data on table, complete it by referencing partial matches to a "Reference" table
I have two tables; "Reference" and "TableA".
I am looking through TableA which is an incomplete table and would like to turn it into a "complete" table by referencing the "Reference" table, filling in missing values, and/or adding rows where there are multiple matches are found.
Reproducible example of "Reference" and "TableA" are below:
A <- c(1,1,1,2,4,4,5,5,7,6,2,1)
B <- c(1,2,2,2,4,4,9,5,8,6,2,9)
C <- c(1,1,3,3,4,5,5,5,7,6,3,3)
D <- c(1,2,1,1,2,1,2,1,2,2,2,1)
Reference <- data.frame(A,B,C,D)
A <- c(NA,1,5,2,4,1)
B <- c(NA,2,NA,2,NA,1)
C <- c(3,NA,5,NA,NA,1)
D <- c(1,1,2,2,1,1)
TableA <- data.frame(A,B,C,D)
I have attempted to resolve this by doing the following:
for (i in 1:dim(TableA)[1])
{
tmp<-TableA[i,]
repet<-ifelse(is.na(TableA$D[i]), Reference, 1 )
for (j in 1:repet) {
tmp$D<-ifelse(repet>1, Reference$D[j,], tmp$D)
collector<-rbind(collector, tmp)
}
}
collector
However, this solution will return the entirety of Reference$D, but I would only like to return those records from Reference$D whose columns A,B,C match (or partially match) what is on TableA.
For example, in Row 1 of TableA, I would like to replace Row 1 with the Reference table's rows 3,4, and 12.
Expected output below.
Note that the Reference table combination 1,2,3,1 appears twice on the expected output as it is a match for both rows 1 & 2 of TableA.
A | B | C | D |
---|---|---|---|
1 | 2 | 3 | 1 |
2 | 2 | 3 | 1 |
1 | 9 | 3 | 1 |
1 | 2 | 3 | 1 |
5 | 9 | 5 | 2 |
2 | 2 | 3 | 2 |
4 | 4 | 5 | 1 |
1 | 1 | 1 | 1 |
Solution 1:[1]
I'll first create an extra column "string" in both TableA
and Reference
, with NA
replaced with a dot .
in TableA
, which would be used in regex matching.
Then find out which string
in TableA
appeared in Reference
, and store them in a matrix.
Finally, replicate the lgl_matrix
row number by the number of matches, and use those row numbers as index in Reference
.
library(tidyverse)
TableA <- TableA %>%
mutate(across(A:D, ~ replace_na(as.character(.x), "."))) %>%
rowwise() %>%
mutate(string = paste0(c_across(A:D), collapse = ""))
Reference <- Reference %>%
rowwise() %>%
mutate(string = paste0(c_across(A:D), collapse = ""))
lgl_matrix <- sapply(TableA$string, grepl, x = Reference$string)
Reference[rep(1:nrow(lgl_matrix), rowSums(lgl_matrix)), -5]
# A tibble: 8 x 4
# Rowwise:
A B C D
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 1
2 1 2 3 1
3 1 2 3 1
4 2 2 3 1
5 4 4 5 1
6 5 9 5 2
7 2 2 3 2
8 1 9 3 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | benson23 |