'How to get the participant number in data with repeat instances

   library(data.table)
    library(tidyverse)
    
    participant.index = c(1,2,3,3,4,5,5,5,6,7)
    repeat.instance = c(1,1,1,2,1,1,2,3,1,1)
    fruits.eaten = c("apple","apple", "grapes", "oranges", "oranges", "pineapple",
                     "pear", "pineapple","banana", "pear")
    gender =c("male", "female", "male", "male", "female",
              "male","male","male","male","female")
    mydata = data.table(participant.index,repeat.instance,fruits.eaten,gender)

Identify total number of males and females

    dt1 = mydata %>% filter(mydata$repeat.instance == '1')
    dt1[, .N, by = gender]

#>    gender N
#> 1:   male 4
#> 2: female 3

I did this using filter function and creating a different data.table. But when dealing with big data, with multiple such columns where the variables are same, are there any better ways to do this. Please help. Thanks in advance.

Solution 1:^[1]

Size does not matter too much, I notice you use both dplyr to filter and then data.table to group. You can do it in one call.

# using the repeat.instance == 1
mydata[repeat.instance == 1, .N, by = gender]

# perhaps more elegant as you use all unique participants
mydata[!duplicated(participant.index), .N, by = gender]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Merijn van Tilborg

'How to get the participant number in data with repeat instances

Identify total number of males and females

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]