'Filter column by group based on another column (R dplyr)
I want to group a column and filter based on occurrence of certain values in another column. In my case, I have an ID variable and want to keep only instances where Time = 1 and = 2. This is a larger dataset so I'm not able to simply keep duplicates.
I tried using group_by and filter, but it has not produced the desired table.
df <- tibble(s1 = c(1001, 1001, 1002, 1002, 1003, 1004), s2 = c(1, 2, 1, 2, 2, 1))
df_filter <- df %>%
group_by(s1) %>%
filter(all(c(1, 2)) %in% s2)
Input:
| ID | Time |
|---|---|
| 1001 | 1 |
| 1001 | 2 |
| 1002 | 1 |
| 1002 | 2 |
| 1003 | 2 |
| 1004 | 1 |
Desired Output:
| s1 | s2 |
|---|---|
| 1001 | 1 |
| 1001 | 2 |
| 1002 | 1 |
| 1002 | 2 |
Solution 1:[1]
You can use n() from dplyr library. It will give you the count of rows. Then you can compare it with 1.
Here's how you can do it:
df_filter <- df %>%
group_by(s1) %>%
filter(n() > 1)
And if you want only 1 or 2 in s2, then you can extend your filter like this:
df_filter <- df %>%
group_by(s1) %>%
filter(s2 == 1 | s2 == 2) %>%
filter(n() > 1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Vishal A. |
