'Creating new columns based on certain strings appearing in a variable atleast twice
I have a data frame containing the variables 'id' and 'var1' similar to the following:
set.seed(100)
id <- sample(1:3, 10, replace = TRUE)
set.seed(101)
var1 <- sample(LETTERS[1:3], 10, replace = TRUE)
df <- data.frame(id, var1)
I want to group the data frame by 'id' and create new columns 'condition1', 'condition2', 'condition3' and so on, if certain strings appear in var1 at least twice. So, when 'df' is grouped by 'id', 'condition1' will be 1 if var1 == 'A' and appears in at least 2 rows or else 'condition1' will be set to 0. Similarly, 'condition2' will be based on 'B' and 'condition3' will be based on 'C'.
So, far I have tried to use dplyr and come up with the following-
library(dplyr)
df2 <- df %>%
group_by(id) %>%
summarise(condition1 = case_when(**var1 == "A" appears in at least 2 rows** ~ 1, **var1 == "A" appears only once or does not appear at all** ~ 0),
condition2 = case_when(**var1 == "B" appears in at least 2 rows** ~ 1, **var1 == "B" appears only once or does not appear at all** ~ 0),
condition3 = case_when(**var1 == "C" appears in at least 2 rows** ~ 1, **var1 == "C" appears only once or does not appear at all** ~ 0))
How do I correctly define the conditions inside case_when? Any other way to solve this would be welcome as well.
Solution 1:[1]
Using data.table
df <- data.table(df)
df[,.(condition1 = sum(var1 == "A") > 1,
condition2 = sum(var1 == "B") > 1,
condition2 = sum(var1 == "C") > 1), id]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
