'R: Filter one column based on another with many to many mapping
I have a dataset with an ID column and an item column. An ID is mapped to one or more items. The dataset has a row for each item mapped to an ID. I want to return IDs that contain my_items. The order of the items does not matter. I have a toy example below.
ID <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
item <- c("a", "b", "c", "a", "b", "c", "d", "a", "b", "d", "b", "a", "c")
df <- data.frame(cbind(ID, item))
df
my_items <- c("a", "b", "c")
My expected output would only include item ID 1 and 5.
Solution 1:[1]
df %>%
group_by(ID) %>%
filter(setequal(item,my_items))
Output
ID item
<chr> <chr>
1 1 a
2 1 b
3 1 c
4 5 b
5 5 a
6 5 c
Solution 2:[2]
We can use all after creating a logical vector with %in% and grouping by 'ID' and also create a condition with n_distinct
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(my_items %in% item), n_distinct(item) == 3) %>%
ungroup
-output
# A tibble: 6 × 2
ID item
<dbl> <chr>
1 1 a
2 1 b
3 1 c
4 5 b
5 5 a
6 5 c
Solution 3:[3]
If we add arrange, we could also use identical in this case:
library(dplyr)
df %>%
group_by(ID) %>%
arrange(item, .by_group = TRUE) %>%
filter(identical(item,my_items))
ID item
<chr> <chr>
1 1 a
2 1 b
3 1 c
4 5 a
5 5 b
6 5 c
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | langtang |
| Solution 2 | |
| Solution 3 | TarJae |

