'Match the column value of a dataframe to another, and if no match, the old value stays as it is

I have dataframe A like this:

Sample1 
Salmon    
Mouse    
Rooster   
Monkey

My dataframe B is like below:

    Sample1 Sample2
    Rooster  Bird     
    Mouse    Rodent
    Salmon   Fish

I would like that in my final dataframe, the sample2 column is assigned by comparison of match between two columns of both files. For this, I have used this command:

final_df$Sample2<- dataframe_B$Sample1[match(dataframe_A$Sample1, dataframe_B$Sample2)]

The command works, but when there is no substitute, like monkey here, NA is returned. How can I modify my code so that the same value(monkey, for example) can be returned if there is no match? My real dataset has thousands of rows. Thanks

In short, my final dataframe looks as below and I don't want NA be shown for Monkey, and I'd like Monkey be there. This is just an example of thousands of rows and I want the same be applied for anything that does not have a match:

   Sample1  Sample2
    Salmon    Fish     
    Mouse     Rodent
    Rooster   Bird
    Monkey     NA


Solution 1:[1]

I'm not sure what your question is, but does the merge() work for you?

dataframe_A = data.frame(
  stringsAsFactors = FALSE,
           Sample1 = c("Salmon", "Mouse", "Rooster", "Monkey")
)

dataframe_B = data.frame(
  stringsAsFactors = FALSE,
  Sample1 = c("Rooster",  "Mouse", "Salmon"),
  Sample2 = c("Bird", "Rodent", "Fish")
)

dataframe_C = merge(
  dataframe_A, 
  dataframe_B, 
  all.x = TRUE
)
dataframe_C$Sample2[is.na(dataframe_C$Sample2)] = dataframe_C$Sample1[is.na(dataframe_C$Sample2)]

dataframe_C

Solution 2:[2]

If I understand you correctly, you can just do left_join like this:

library(dplyr)
df1 %>%
  left_join(., df2, by = "Sample1")

Output:

  Sample1 Sample2
1  Salmon    Fish
2   Mouse  Rodent
3 Rooster    Bird
4  Monkey    <NA>

Data

df1 <- data.frame(Sample1 = c("Salmon", "Mouse", "Rooster", "Monkey"))
df2 <- data.frame(Sample1 = c("Rooster", "Mouse", "Salmon"),
                  Sample2 = c("Bird", "Rodent", "Fish"))

Solution 3:[3]

if

a <- data.frame(sample1 = c("Salmon", "Mouse", "Rooster", "Monkey")) 

and

b <- data.frame(sample1 = c("Rooster", "Mouse", "Salmon"), sample2 = c("Bird", "Rodent", "Fish")) 

then

c <- c(a$sample1[match(b$sample1, a$sample1)], a$sample1[which(!a$sample1 %in% b$sample1)])

using which and ! to filter out the one that is not matching

you can put it into a data.frame as such:


data.frame(c = c(a$sample1[match(b$sample1, a$sample1)], a$sample1[which(!a$sample1 %in% b$sample1)]))

 c
1 Rooster
2   Mouse
3  Salmon
4  Monkey

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 deadfate-sky
Solution 2 Quinten
Solution 3 Jahi Zamy