'How to select group based on the order of rows within group in R
For example, I have the following dataframe:
| ID | variable | order |
|---|---|---|
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | b | 1 |
| 2 | a | 2 |
| 2 | b | 3 |
| 3 | b | 1 |
| 3 | a | 2 |
I would like to keep only the ID groups where "a" appears before "b" (i.e., the "order" of a is smaller than b). So the result would look something like this:
| ID | variable | order |
|---|---|---|
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | b | 1 |
| 2 | a | 2 |
| 2 | b | 3 |
Where only ID 1 and 2 remain (with all of its original rows), and all rows in ID 3 are removed because the "order" of b is smaller than a. Would anyone have guidance on how this could be done in R?
Solution 1:[1]
your_data %>%
group_by(ID) %>%
filter(any(variable == "a" & lead(variable, default = "not b") == "b"))
This ignores the order column and is based on the order of rows within each ID group. It check for the presence of an "a" on one row AND a "b" on the immediate next row.
In your comment you say '"a" right before "b"' - I went with that "right before" clarification. If a group has values "a", "c", "b" it would not be kept in my answer as the "a" is not "right before" the "b".
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
