'How to select group based on the order of rows within group in R

For example, I have the following dataframe:

ID variable order
1 a 1
1 b 2
2 b 1
2 a 2
2 b 3
3 b 1
3 a 2

I would like to keep only the ID groups where "a" appears before "b" (i.e., the "order" of a is smaller than b). So the result would look something like this:

ID variable order
1 a 1
1 b 2
2 b 1
2 a 2
2 b 3

Where only ID 1 and 2 remain (with all of its original rows), and all rows in ID 3 are removed because the "order" of b is smaller than a. Would anyone have guidance on how this could be done in R?



Solution 1:[1]

your_data %>%
  group_by(ID) %>%
  filter(any(variable == "a" & lead(variable, default = "not b") == "b"))

This ignores the order column and is based on the order of rows within each ID group. It check for the presence of an "a" on one row AND a "b" on the immediate next row.

In your comment you say '"a" right before "b"' - I went with that "right before" clarification. If a group has values "a", "c", "b" it would not be kept in my answer as the "a" is not "right before" the "b".

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1