'R Finding the maximum value in subsets of observations without using summarise or filter
everyone,
I have the following dataframe example, with the ID of patients (1 and 2), their category X (YES or NO), and the values of a parameter
df <- data.frame (ID = c (1, 1, 1, 1, 2, 2, 2),
X = c ("YES", "YES", "NO", "NO", "YES", "NO", "NO"),
Value = c (10, 15, 12, 13, 18, 16, 17))
df
This provides the following table:
ID X Value
1 1 YES 10
2 1 YES 15
3 1 NO 12
4 1 NO 13
5 2 YES 18
6 2 NO 16
7 2 NO 17
I would like to get a new column result that would give the maximum value, per patient, responding to "YES" in the column X as follows
ID X Value Result
1 1 YES 10 15
2 1 YES 15 15
3 1 NO 12 15
4 1 NO 13 15
5 2 YES 18 18
6 2 NO 16 18
7 2 NO 19 18
I know that I can use group_by and summarise to obtain the values, but I would like to use mutate so that I can follow all the variables that I build for this project, and for the same reason, avoid the filter function.
The following solution provides me the result column, but again I would like only one value per ID.
df %>%
group_by(ID,X)%>%
mutate (Result = max(Value))
ID X Value Result
<dbl> <chr> <dbl> <dbl>
1 1 YES 10 15
2 1 YES 15 15
3 1 NO 12 13
4 1 NO 13 13
5 2 YES 18 18
6 2 NO 16 19
7 2 NO 19 19
Thank you very much for your help
Solution 1:[1]
Using data.table
library(data.table)
setDT(df)[, result := max(Value[X == "YES"], na.rm = TRUE), ID]
Solution 2:[2]
What about this?
> transform(df, Result = ave(Value, ID, X, FUN = max))
ID X Value Result
1 1 YES 10 15
2 1 YES 15 15
3 1 NO 12 13
4 1 NO 13 13
5 2 YES 18 18
6 2 NO 16 19
7 2 NO 19 19
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | akrun |
Solution 2 | ThomasIsCoding |