'R: select the IDs with 2 records (rows) in a certain domain
ID <- c(1,1,1,2,2,2,2,3,4,4,4,5,5,5,5,5)
date <- c('2018-06-29','2018-08-29','2018-08-29','2020-12-11','2020-12-12','2021-01-21','2021-12-21','2020-11-29','2019-07-12','2018-05-13','2021-06-08','2017-11-14','2018-07-15','2019-01-15','2020-02-19','2021-11-09')
domain <- c('condition','condition','procedure','condition','procedure','condition','condition','procedure','condition','procedure','procedure','procedure','condition','procedure','condition','procedure')
df <- cbind(ID, date, domain)
Example dateset:
ID date domain
1 2018-06-29 condition
1 2018-08-29 condition
1 2018-08-29 procedure
2 2020-12-11 condition
2 2020-12-12 procedure
2 2021-01-21 condition
2 2021-12-21 condition
3 2020-11-29 procedure
4 2019-07-12 condition
4 2018-05-13 procedure
4 2021-06-08 procedure
5 2017-11-14 procedure
5 2018-07-15 condition
5 2019-01-15 procedure
5 2020-02-19 condition
5 2021-11-09 procedure
As you can see, some IDs are with one row in domain 'procedure', while others are with two or three rows in domain 'procedure'. May I ask how to separate the IDs with different number of rows in domain 'procedure'?
Thank you!!
Solution 1:[1]
We can use split to split the 'ID' based on the number of 'procedure' elements in domain per 'ID' into a list of vectors
with(df, split(ID, ave(domain == "procedure", ID, FUN = sum)))
-output
$`1`
[1] 1 1 1 2 2 2 2 3
$`2`
[1] 4 4 4
$`3`
[1] 5 5 5 5 5
If we want unique IDs
with(with(df, stack(tapply(domain == "procedure", ID, FUN = sum))),
split(as.character(ind), values))
Update
If we wanted the separate the data.frame into subset of data.frame based on the number of occurrence of 'procedure' for each 'ID', then split on the dataset instead of the column
split(df, ave(domain == "procedure", ID, FUN = sum))
$`1`
ID date domain
1 1 2018-06-29 condition
2 1 2018-08-29 condition
3 1 2018-08-29 procedure
4 2 2020-12-11 condition
5 2 2020-12-12 procedure
6 2 2021-01-21 condition
7 2 2021-12-21 condition
8 3 2020-11-29 procedure
$`2`
ID date domain
9 4 2019-07-12 condition
10 4 2018-05-13 procedure
11 4 2021-06-08 procedure
$`3`
ID date domain
12 5 2017-11-14 procedure
13 5 2018-07-15 condition
14 5 2019-01-15 procedure
15 5 2020-02-19 condition
16 5 2021-11-09 procedure
data
df <- data.frame(ID, date, domain)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
