'using ntile within a group
I'm trying to calculate the quintiles within each group of a dataframe. If I do:
mtcars %>%
group_by(gear,carb) %>%
summarise(total = sum(wt), .groups = "keep") %>%
mutate(rank = ntile(total,5))
All entries within the rank column are equal to 1. What am I doing wrong here?
Solution 1:[1]
Because when you group_by(gear, carb), unique combinations of these two variables are treated as a group. Since you used summrise(..., .groups = "keep"), all grouping variables in the input are preserved. In this case, there's only one unique combinations for these two columns, and therefore, every row would be in it's own group (note # Groups: gear, carb [11] in the tibble output). Therefore, you are calculating ntile of one element for every group, and the result will of course be 1.
If you don't include the .groups = "keep" argument, the last grouping variable will be dropped (carb will be dropped), and you can see rank per gear (note # Groups: gear [3]).
A little more information in the documentation (?dplyr::summarise):
When .groups is not specified, it is chosen based on the number of rows of the results:
- If all the results have 1 row, you get "drop_last".
- If the number of rows varies, you get "keep".
library(dplyr)
mtcars %>%
group_by(gear,carb) %>%
summarise(total = sum(wt)) %>%
mutate(rank = ntile(total, 5))
# A tibble: 11 × 4
# Groups: gear [3]
gear carb total rank
<dbl> <dbl> <dbl> <int>
1 3 1 9.14 1
2 3 2 14.2 3
3 3 3 11.6 2
4 3 4 23.4 4
5 4 1 8.29 1
6 4 2 10.7 2
7 4 4 12.4 3
8 5 2 3.65 4
9 5 4 3.17 2
10 5 6 2.77 1
11 5 8 3.57 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
