'How to replace values with IDs in 2 dataframes that are identical
I have two data frames with different length, that are identical in terms of columns (46 col.). But the values are different. So for example I have one ID 123 that has a value of 5 in the first data frame and 3 in the other data frame. My df consists of approx. 2000 observations and 400 obs. I want to identify the same IDs and replace the values from one df into the other with the IDs.
I tried to merge my two dfs by ID:
data_merge = right_join(data_new, data.raw, by = "ID")
So I got the identification, but as well a df of 93 var.
How can I replace the right values from data_new (400 obs.) with data.raw (2000 obs.)?
Suggested Toy Data
data.raw <- data.frame(ID = 1:3,
x = rep(1, 3),
y = rep(2, 3),
z = rep(3, 3))
data_new <- data.frame(ID = 2,
x = 1,
y = 7,
z = 3)
Solution 1:[1]
This can be done (if I understand your question correct) using a data.table update join (join by reference).
library(data.table)
# convert to data.table type
setDT(data.raw); setDT(data_new)
# names of the columns to update
colsUpdate <- names(data.raw)[-1]
# perform the update, replace the columns colsUpdate in data.raw with
# the values of the columsn in colsUpdate from data_new, if IDs match
data.raw[data_new, (colsUpdate) := mget(paste0("i.", colsUpdate)), on = .(ID)][]
# ID x y z
# 1: 1 1 2 3
# 2: 2 1 7 3
# 3: 3 1 2 3
Solution 2:[2]
Here's a quasi tidyverse way of doing this. You'll have to forgive my use of reshape2::dcast. I'm still struggling to grasp the finer points of pivot_ functions.
library(dplyr)
library(tidyr)
library(reshape2)
data_merge <- left_join(data.raw,
data_new,
by = "ID",
suffix = c("_raw", "_new")) %>%
pivot_longer(-ID,
names_to = c("variable", "condition"),
names_sep = "_") %>%
dcast(ID + variable ~ condition,
value.var = "value") %>%
mutate(new = ifelse(is.na(new), raw, new)) %>%
pivot_wider(id_cols = ID,
names_from = "variable",
values_from = "new")
Solution 3:[3]
Subtract ID matches, then merge.
merge(data.raw[-match(data_new$ID, data.raw$ID), ], data_new, all=T)
# ID x y z
# 1 1 1 2 3
# 2 2 1 7 3
# 3 3 1 2 3
Data:
data.raw <- structure(list(ID = 1:3, x = c(1, 1, 1), y = c(2, 2, 2), z = c(3,
3, 3)), class = "data.frame", row.names = c(NA, -3L))
data_new <- structure(list(ID = 2, x = 1, y = 7, z = 3), class = "data.frame", row.names = c(NA,
-1L))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Benjamin |
| Solution 3 | jay.sf |
