'tidyverse solution: is there a way to keep only rows when a certain word/value occurs e.g. 3x in a column

Lets say the data looks like this

A <- c("name1", "name2", "name3", "name1", "name1", "name4")
B <- c(10, 8, 7, 3, -1, -2)
C <- c(8, 3, -1, -10, -2, -2)
df <- data.frame(A, B, C)
df

      A  B   C
1 name1 10   8
2 name2  8   3
3 name3  7  -1
4 name1  3 -10
5 name1 -1  -2
6 name6 -2  -2

Now there must be a smart way to "collect" ONLY the rows that have triplicated values for the first column (A) into a new dataframe. So for this particular example that would be all rows that have "name1" because that is repeated thrice. How to do this if the dataset is very large, how can you detect and keep rows with triplicated (or any other arbitrary number) of values?



Solution 1:[1]

Slightly different dplyr approach:

df %>%
  add_count(A, name = "A_count")%>%
  filter(A_count == 3) %>%
  select(-A_count)

Add a count of the variable in A, name the count (otherwise that column would be named n) and then filter, remove the column with select -.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jpenzer