'In R, search for several unique IDs from one dataset in another

I am trying to create a new column in a dataset. I want this column to be a "yes" or "no" column. Let's say that I have one dataset that has 1000 rows including a unique ID and another dataset that has 200 rows including a unique ID. The unique ID's match between the datasets because both datasets are from the same database.

I want to create a column in the larger dataset based on a search criteria, I want to search the unique IDs in the larger dataset and the new column will say "yes" for any of the unique IDs that also belong in the unique ID column in the smaller dataset. Basically if the ID is found in the small and the large dataset it will say Yes and if not then No.

Example: enter image description here This is what I want. Except in my case the 2 columns will be in different datasets.

I've tried to do this in R and even in Excel. I've tried merging the 2 datasets by the ID column but that doesn't get me what I want, which is a new column "yes" or "no" if the ID is found in both datasets. What should I do? I think I can use the %>% to solve my problem but I'm lost where to start..



Solution 1:[1]

The key here is checking which value is contained in the other dataset. Basically a conditional operation comparing between the two vectors of IDs, in this case can be easily solved using %in%:

Data:

# Dataset with 5 letters and values
dat <- data.frame(
  id = LETTERS[1:5],
  val = 1:5
)

# Subset
minidat <- dat[4,]
  1. Base R
new_dat <- dat # Or modify in place
new_dat$is_in_smaller <- ifelse(dat$id %in% minidat$id, "yes", "no")

new_dat
##   id val is_in_smaller
## 1  A   1            no
## 2  B   2            no
## 3  C   3            no
## 4  D   4           yes
## 5  E   5            no
  1. {dplyr} approach, identical output
library(dplyr)

new_dat2 <- dat %>% 
  mutate(is_in_smaller = ifelse(id %in% minidat$id, "yes", "no"))
  1. {data.table}
library(data.table)

new_dat3 <- as.data.table(dat) # Assuming you already have a data.table object
new_dat3[, is_in_smaller := ifelse(id %in% minidat$id, "yes", "no")]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Amit.L