'How to get the participant number in data with repeat instances

   library(data.table)
    library(tidyverse)
    
    participant.index = c(1,2,3,3,4,5,5,5,6,7)
    repeat.instance = c(1,1,1,2,1,1,2,3,1,1)
    fruits.eaten = c("apple","apple", "grapes", "oranges", "oranges", "pineapple",
                     "pear", "pineapple","banana", "pear")
    gender =c("male", "female", "male", "male", "female",
              "male","male","male","male","female")
    mydata = data.table(participant.index,repeat.instance,fruits.eaten,gender)

Identify total number of males and females

    dt1 = mydata %>% filter(mydata$repeat.instance == '1')
    dt1[, .N, by = gender]

#>    gender N
#> 1:   male 4
#> 2: female 3

I did this using filter function and creating a different data.table. But when dealing with big data, with multiple such columns where the variables are same, are there any better ways to do this. Please help. Thanks in advance.

r


Solution 1:[1]

Size does not matter too much, I notice you use both dplyr to filter and then data.table to group. You can do it in one call.

# using the repeat.instance == 1
mydata[repeat.instance == 1, .N, by = gender]

# perhaps more elegant as you use all unique participants
mydata[!duplicated(participant.index), .N, by = gender]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Merijn van Tilborg