'How to get the participant number in data with repeat instances
library(data.table)
library(tidyverse)
participant.index = c(1,2,3,3,4,5,5,5,6,7)
repeat.instance = c(1,1,1,2,1,1,2,3,1,1)
fruits.eaten = c("apple","apple", "grapes", "oranges", "oranges", "pineapple",
"pear", "pineapple","banana", "pear")
gender =c("male", "female", "male", "male", "female",
"male","male","male","male","female")
mydata = data.table(participant.index,repeat.instance,fruits.eaten,gender)
Identify total number of males and females
dt1 = mydata %>% filter(mydata$repeat.instance == '1')
dt1[, .N, by = gender]
#> gender N
#> 1: male 4
#> 2: female 3
I did this using filter function and creating a different data.table. But when dealing with big data, with multiple such columns where the variables are same, are there any better ways to do this. Please help. Thanks in advance.
Solution 1:[1]
Size does not matter too much, I notice you use both dplyr to filter and then data.table to group. You can do it in one call.
# using the repeat.instance == 1
mydata[repeat.instance == 1, .N, by = gender]
# perhaps more elegant as you use all unique participants
mydata[!duplicated(participant.index), .N, by = gender]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Merijn van Tilborg |
