'Match the column value of a dataframe to another, and if no match, the old value stays as it is
I have dataframe A like this:
Sample1
Salmon
Mouse
Rooster
Monkey
My dataframe B is like below:
Sample1 Sample2
Rooster Bird
Mouse Rodent
Salmon Fish
I would like that in my final dataframe, the sample2 column is assigned by comparison of match between two columns of both files. For this, I have used this command:
final_df$Sample2<- dataframe_B$Sample1[match(dataframe_A$Sample1, dataframe_B$Sample2)]
The command works, but when there is no substitute, like monkey here, NA is returned. How can I modify my code so that the same value(monkey, for example) can be returned if there is no match? My real dataset has thousands of rows. Thanks
In short, my final dataframe looks as below and I don't want NA be shown for Monkey, and I'd like Monkey be there. This is just an example of thousands of rows and I want the same be applied for anything that does not have a match:
Sample1 Sample2
Salmon Fish
Mouse Rodent
Rooster Bird
Monkey NA
Solution 1:[1]
I'm not sure what your question is, but does the merge() work for you?
dataframe_A = data.frame(
stringsAsFactors = FALSE,
Sample1 = c("Salmon", "Mouse", "Rooster", "Monkey")
)
dataframe_B = data.frame(
stringsAsFactors = FALSE,
Sample1 = c("Rooster", "Mouse", "Salmon"),
Sample2 = c("Bird", "Rodent", "Fish")
)
dataframe_C = merge(
dataframe_A,
dataframe_B,
all.x = TRUE
)
dataframe_C$Sample2[is.na(dataframe_C$Sample2)] = dataframe_C$Sample1[is.na(dataframe_C$Sample2)]
dataframe_C
Solution 2:[2]
If I understand you correctly, you can just do left_join like this:
library(dplyr)
df1 %>%
left_join(., df2, by = "Sample1")
Output:
Sample1 Sample2
1 Salmon Fish
2 Mouse Rodent
3 Rooster Bird
4 Monkey <NA>
Data
df1 <- data.frame(Sample1 = c("Salmon", "Mouse", "Rooster", "Monkey"))
df2 <- data.frame(Sample1 = c("Rooster", "Mouse", "Salmon"),
Sample2 = c("Bird", "Rodent", "Fish"))
Solution 3:[3]
if
a <- data.frame(sample1 = c("Salmon", "Mouse", "Rooster", "Monkey"))
and
b <- data.frame(sample1 = c("Rooster", "Mouse", "Salmon"), sample2 = c("Bird", "Rodent", "Fish"))
then
c <- c(a$sample1[match(b$sample1, a$sample1)], a$sample1[which(!a$sample1 %in% b$sample1)])
using which and ! to filter out the one that is not matching
you can put it into a data.frame as such:
data.frame(c = c(a$sample1[match(b$sample1, a$sample1)], a$sample1[which(!a$sample1 %in% b$sample1)]))
c
1 Rooster
2 Mouse
3 Salmon
4 Monkey
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | deadfate-sky |
| Solution 2 | Quinten |
| Solution 3 | Jahi Zamy |
