'R: Missing data on table, complete it by referencing partial matches to a "Reference" table

I have two tables; "Reference" and "TableA".

I am looking through TableA which is an incomplete table and would like to turn it into a "complete" table by referencing the "Reference" table, filling in missing values, and/or adding rows where there are multiple matches are found.

Reproducible example of "Reference" and "TableA" are below:

A <- c(1,1,1,2,4,4,5,5,7,6,2,1)
B <- c(1,2,2,2,4,4,9,5,8,6,2,9)
C <- c(1,1,3,3,4,5,5,5,7,6,3,3)
D <- c(1,2,1,1,2,1,2,1,2,2,2,1)

Reference <- data.frame(A,B,C,D)

A <- c(NA,1,5,2,4,1)
B <- c(NA,2,NA,2,NA,1)
C <- c(3,NA,5,NA,NA,1)
D <- c(1,1,2,2,1,1)

TableA <- data.frame(A,B,C,D)

I have attempted to resolve this by doing the following:

for (i in 1:dim(TableA)[1])
{
  tmp<-TableA[i,]
  repet<-ifelse(is.na(TableA$D[i]), Reference, 1 )
  for (j in 1:repet) {
    tmp$D<-ifelse(repet>1, Reference$D[j,], tmp$D)
    collector<-rbind(collector, tmp)
    }
}
collector

However, this solution will return the entirety of Reference$D, but I would only like to return those records from Reference$D whose columns A,B,C match (or partially match) what is on TableA.

For example, in Row 1 of TableA, I would like to replace Row 1 with the Reference table's rows 3,4, and 12.

Expected output below.

Note that the Reference table combination 1,2,3,1 appears twice on the expected output as it is a match for both rows 1 & 2 of TableA.

A B C D
1 2 3 1
2 2 3 1
1 9 3 1
1 2 3 1
5 9 5 2
2 2 3 2
4 4 5 1
1 1 1 1


Solution 1:[1]

I'll first create an extra column "string" in both TableA and Reference, with NA replaced with a dot . in TableA, which would be used in regex matching.

Then find out which string in TableA appeared in Reference, and store them in a matrix.

Finally, replicate the lgl_matrix row number by the number of matches, and use those row numbers as index in Reference.

library(tidyverse)

TableA <- TableA %>% 
  mutate(across(A:D, ~ replace_na(as.character(.x), "."))) %>% 
  rowwise() %>% 
  mutate(string = paste0(c_across(A:D), collapse = ""))

Reference <- Reference %>% 
  rowwise() %>% 
  mutate(string = paste0(c_across(A:D), collapse = ""))

lgl_matrix <- sapply(TableA$string, grepl, x = Reference$string)

Reference[rep(1:nrow(lgl_matrix), rowSums(lgl_matrix)), -5]

# A tibble: 8 x 4
# Rowwise: 
      A     B     C     D
  <dbl> <dbl> <dbl> <dbl>
1     1     1     1     1
2     1     2     3     1
3     1     2     3     1
4     2     2     3     1
5     4     4     5     1
6     5     9     5     2
7     2     2     3     2
8     1     9     3     1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 benson23