'How to merge two tables based on rows and columns in R?

df1 <- data.frame(MLID=c('992','992','BJR'),
              Position=c('N0','N1','N1'),
              Weight=c(0.125,0.58,0.69))


df2 <- data.frame(MLID=c('992','992','992','992',
                     'BJR','BJR','BJR','BJR'),
              Weight=c(0,0.251,0.501,1.001,
                       0,0.251,0.501,1.001),
              N0=c(2.80,4.05,4.05,4.05,
                   4.05,4.05,4.05,4.05),
              N1=c(3.47,4.73,4.95,5.15,
                   4.73,7.73,4.95,5.15) )

What I want is merge this two tables follow the rules:

  1. the MLID
  2. Look at position(Its N0 or N1)
  3. Find the range where the weight sits in (like the approximately vlookup in excel)(2.8 means charge for weight (0,0.250) for 992 N0, 4.05 for weight (0.251,0.500) for 992 N0,3.47 for weight (0,0.250) 992 N1 etc.

So the final output should be :

MILD  Position  Weight  Charge
992      N0     0.125    2.8
992      N1     0.580    4.95
BJR      N1     0.690    4.95

Is it possible do it in R? especially in dplyr package?



Solution 1:[1]

An option using data.table rolling join can be achieved. First, df2 needs to be transformed in long-format using melt and then join both df1 and df2.

library(data.table)

setDT(df1, key = c("MLID", "Position","Weight") )

df2 <- melt(df2, id.vars = c("MLID","Weight"), variable.name = "Position", 
                                                      value.name = "Charge")

setDT(df2, key = c("MLID", "Position","Weight"))

df2[df1, roll = "nearest"]
#    MLID Weight Position Charge
# 1:  992  0.580       N1   4.95
# 2:  992  0.125       NO   2.80
# 3:  BJR  0.690       N1   4.95

Option#2: A tidyverse based approach can be as:

library(tidyverse)
df2 %>% gather(Position, Charge, -MLID, -Weight) %>%
  right_join(df1, by=c("MLID", "Position")) %>%
  filter(Weight.x <= Weight.y) %>%
  group_by(MLID, Position) %>%
  arrange(Weight.y-Weight.x) %>% 
  slice(1) %>%
  select(MLID, Weight = Weight.y, Position, Charge)

# # A tibble: 3 x 4
# # Groups: MLID, Position [3]
#   MLID  Weight Position Charge
#   <chr>  <dbl> <chr>     <dbl>
# 1 992    0.580 N1         4.95
# 2 992    0.125 NO         2.80
# 3 BJR    0.690 N1         4.95

Data:

The OP's data is slightly modified to include stringsAsFactors = FALSE argument in data.frame in order to avoid unnecessary warnings.

df1 <- data.frame(MLID=c('992','992','BJR'),
                  Position=c('NO','N1','N1'),
                  Weight=c(0.125,0.58,0.69), stringsAsFactors = FALSE)


df2 <- data.frame(MLID=c('992','992','992','992',
                         'BJR','BJR','BJR','BJR'),
                  Weight=c(0,0.251,0.501,1.001,
                           0,0.251,0.501,1.001),
                  NO=c(2.80,4.05,4.05,4.05,
                       4.05,4.05,4.05,4.05),
                  N1=c(3.47,4.73,4.95,5.15,
                       4.73,7.73,4.95,5.15), stringsAsFactors = FALSE )

Solution 2:[2]

We could use a non-equi join with data.table. Reshape the second dataset to 'long' format with melt and join with the first data on "MLID", "Position" and the non-equi comparison on the "Weight' columns and assign the last value of 'Charge' to create the column in 'df1'

library(data.table)
setDT(df1)[setnames(melt(setDT(df2), measure = c("NO", "N1"), 
       variable.name = "Position", value.name = "Charge"), "Weight", "wt"), 
      Charge := Charge, on = .(MLID, Position, Weight > wt), mult = "last"] 

df1
#   MLID Position Weight Charge
#1:  992       NO  0.125   2.80
#2:  992       N1  0.580   4.95
#3:  BJR       N1  0.690   4.95

Solution 3:[3]

Here is a base R version:

outdf <- merge(df1, df2, by = "MLID")
outdf$dist <- abs(outdf$Weight.x - outdf$Weight.y)
ting <- aggregate(dist ~ MLID + Position, FUN = function(x) min(x), data = outdf)
outdf2 <- merge(outdf, ting, by.x = c("MLID", "Position", "dist"))
outdf2$charge <- ifelse(outdf2$Position == "N1", outdf2$N1, outdf2$NO)
outdf2 <- outdf2[,c("MLID", "Position", "Weight.x", "charge")]
outdf2
# MLID Position Weight.x charge
# 1  992       N1    0.580   4.95
# 2  992       NO    0.125   2.80
# 3  BJR       N1    0.690   4.95

Solution 4:[4]

joined_df <- merge(data1, data2, by.x = "column key name in data1", 
         by.y = "column key name in data2", all.x = TRUE, all.y = FALSE)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 akrun
Solution 3 AidanGawronski
Solution 4 Muraino