'How to replace values with IDs in 2 dataframes that are identical

I have two data frames with different length, that are identical in terms of columns (46 col.). But the values are different. So for example I have one ID 123 that has a value of 5 in the first data frame and 3 in the other data frame. My df consists of approx. 2000 observations and 400 obs. I want to identify the same IDs and replace the values from one df into the other with the IDs.

I tried to merge my two dfs by ID:

data_merge = right_join(data_new, data.raw, by = "ID")

So I got the identification, but as well a df of 93 var.

How can I replace the right values from data_new (400 obs.) with data.raw (2000 obs.)?

Suggested Toy Data

data.raw <- data.frame(ID = 1:3, 
                       x = rep(1, 3), 
                       y = rep(2, 3), 
                       z = rep(3, 3))

data_new <- data.frame(ID = 2, 
                       x = 1, 
                       y = 7, 
                       z = 3)

Solution 1:^[1]

This can be done (if I understand your question correct) using a data.table update join (join by reference).

library(data.table)
# convert to data.table type
setDT(data.raw); setDT(data_new)
# names of the columns to update
colsUpdate <- names(data.raw)[-1]
# perform the update, replace the columns colsUpdate in data.raw with
#  the values of the columsn in colsUpdate from data_new, if IDs match
data.raw[data_new, (colsUpdate) := mget(paste0("i.", colsUpdate)), on = .(ID)][]
#    ID x y z
# 1:  1 1 2 3
# 2:  2 1 7 3
# 3:  3 1 2 3

Solution 2:^[2]

Here's a quasi tidyverse way of doing this. You'll have to forgive my use of reshape2::dcast. I'm still struggling to grasp the finer points of pivot_ functions.

library(dplyr)
library(tidyr)
library(reshape2)

data_merge <- left_join(data.raw, 
                        data_new, 
                        by = "ID", 
                        suffix = c("_raw", "_new")) %>% 
  pivot_longer(-ID, 
               names_to = c("variable", "condition"), 
               names_sep = "_") %>% 
  dcast(ID + variable ~ condition, 
        value.var = "value") %>% 
  mutate(new = ifelse(is.na(new), raw, new)) %>% 
  pivot_wider(id_cols = ID, 
              names_from = "variable", 
              values_from = "new")

Solution 3:^[3]

Subtract ID matches, then merge.

merge(data.raw[-match(data_new$ID, data.raw$ID), ], data_new, all=T)
#   ID x y z
# 1  1 1 2 3
# 2  2 1 7 3
# 3  3 1 2 3

Data:

data.raw <- structure(list(ID = 1:3, x = c(1, 1, 1), y = c(2, 2, 2), z = c(3, 
3, 3)), class = "data.frame", row.names = c(NA, -3L))

data_new <- structure(list(ID = 2, x = 1, y = 7, z = 3), class = "data.frame", row.names = c(NA, 
-1L))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	Benjamin
Solution 3	jay.sf

'How to replace values with IDs in 2 dataframes that are identical

Suggested Toy Data

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]