'Count first occurence of a dummy (grouped by) in R and then sum

structure(list(id = c(1L, 1L, 2L, 3L, 3L, 3L, 4L), hire_year = c(2017L, 
2017L, 2017L, 2017L, 2016L, 2014L, 2016L), dummy = c(0L, 0L, 
1L, 0L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA, 
-7L))

  id hire_year dummy
1  1      2017     0
2  1      2017     0
3  2      2017     1
4  3      2017     0
5  3      2016     0
6  3      2014     0
7  4      2016     1

I would like to count the number of rows for which the dummy equals 0. However, I would like each id to make the count only once, even though for the same id I may have more than one row with the dummy equaling 0. Here I would expect the output to be [2].



Solution 1:[1]

You may use distinct to keep only unique rows then count number of 0's.

df %>%
  distinct(id, .keep_all = TRUE) %>%
  summarise(dummy = sum(dummy == 0))

#  dummy
#1     2

Solution 2:[2]

length(unique(df$id[df$dummy==0]))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ronak Shah
Solution 2 langtang