'How to subset R data frame based on duplicates in one column and unique values in another

This seems pretty straightforward but I am stumped. I have a data frame that looks like this:

df1 %>% head()
  values  paired
  <ch>     <int>
1 apples      1         
2 x           1          
3 oranges     2          
4 z           2          
5 bananas     3          
6 y           3 
7 apples      4
8 p           4

I would like to create a new data frame that extracts all paired values based on a search criteria. So if I want all pairs that correspond to apples I would like to end up with something like this:

df1 %>% head()
  values  paired
  <ch>     <int>
1 apples      1         
2 x           1                   
3 apples      4
4 p           4

I have tried using:

new_pairs <- df1 %>%
  arrange(values, paired) %>%
  filter(duplicated(paired) == TRUE,
         values=="apples") 

But I am getting only the apple rows back

r


Solution 1:[1]

Here is another tidyverse possibility. I filter for the rows that have apples and also keep the rows that immediately follow apples.

library(tidyverse)

df %>%
  filter((values == "apples" |
            row_number() %in% c(which(
              values == "apples", arr.ind = TRUE
            ) + 1)))

Output

  values paired
1 apples      1
2      x      1
3 apples      4
4      p      4

Here is a data.table option (subset is only used to change the order of the columns):

library(data.table)

dt <- as.data.table(df)
subset(dt[, .SD[any(values == "apples")], by = paired], select = c("values", "paired"))

   values paired
1: apples      1
2:      x      1
3: apples      4
4:      p      4

Data

df <-
    structure(list(
      values = c("apples", "x", "oranges", "z", "bananas",
                 "y", "apples", "p"),
      paired = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L)
    ),
    class = "data.frame",
    row.names = c(NA,-8L))

Solution 2:[2]

In base R, find the values of the pairs of interest

pairs = subset(df1, values %in% "apples")$paired

and create a subset of the data

subset(df1, paired %in% pairs)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Martin Morgan