'Subset groups in a data.table using conditions on two columns
I have a data.table with a high number of groups. I would like to subset whole groups (not just rows) based on the conditions on multiple columns. Consider the following data.table:
DT <- structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L),
group = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C"),
y = c(14, 19, 16, 10, 6, 8, 14, 19, 10, 9, 6, 8),
x = c(3, 3, 2, 3, 3, 3, 3, 2, 2, 3, 3, 3)),
row.names = c(NA, -12L),
class = c("data.table", "data.frame"))
>DT
id group y x
1: 1 A 14 3
2: 2 A 19 3
3: 3 A 16 2
4: 4 A 10 3
5: 5 B 6 3
6: 6 B 8 3
7: 7 B 14 3
8: 8 B 19 2
9: 9 C 10 2
10: 10 C 9 3
11: 11 C 6 3
12: 12 C 8 3
I would like to keep groups that have y=6 and x=3 in the same row. So that I would have only class B and C (preferably using data.table package in R):
id group y x
1: 5 B 6 3
2: 6 B 8 3
3: 7 B 14 3
4: 8 B 19 2
5: 9 C 10 2
6: 10 C 9 3
7: 11 C 6 3
8: 12 C 8 3
All my attempts gave me only those rows containing y=6 and x=3, which I do not want:
id group y x
1: 5 B 6 3
2: 11 C 6 3
Solution 1:[1]
Try dplyr package
#select groups containing y and x
groups = DT %>% filter(y == 6, x == 3) %>% select(group) %>% unique() %>% unlist() %>% as.vector()
# filter for selected groups
DT %>% filter(group %in% groups)
Solution 2:[2]
A data.table option
> DT[group %in% DT[.(3, 6), group, on = .(x, y)]]
id group y x
1: 5 B 6 3
2: 6 B 8 3
3: 7 B 14 3
4: 8 B 19 2
5: 9 C 10 2
6: 10 C 9 3
7: 11 C 6 3
8: 12 C 8 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | rkabuk |
| Solution 2 | ThomasIsCoding |
