'How to split TOP N and rest by category in R?
I have the following dataframe:
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12","18","10","17")
category <- c("a","b","c","c","d","e","f","g","h","i","j","k","l","z","x","c","v","b")
growth <- c(33,16,49,57,45,67.75,90.85,10,33,76,3,77,88.98,65,98,76,45,89)
df <- data.frame(category,Sale,growth)
I want to rank/order categories by sales. Next, I need to select the TOP 9 category and put all the rest in the new category called 'others'. Finally need to add a rank column to this newly calculated df, where each row category has its rank by order and category 'other' is always stick to 10th! Like this:
Category Sale Rank growth
l 89 1 ...
f 56 2
I 56 3
h 45 4
b 45 5
c 36 6
j 33 7
k 32 8
z 32 9
other 164 10
Solution 1:[1]
df %>%
type.convert(as.is = TRUE) %>%
mutate(rank = row_number(desc(Sale)),
category = ifelse(rank>9, 'other', category),
rank = ifelse(rank>9, 10, rank)) %>%
group_by(category, rank) %>%
summarise(Sale= sum(Sale), growth = sum(growth), .groups = 'drop') %>%
arrange(rank)
# A tibble: 10 x 4
category rank Sale growth
<chr> <dbl> <int> <dbl>
1 l 1 89 89.0
2 f 2 56 90.8
3 i 3 56 76
4 b 4 45 16
5 h 5 45 33
6 c 6 33 57
7 j 7 33 3
8 z 8 33 65
9 k 9 32 77
10 other 10 164 513.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | onyambu |
