'how to calculate many percentages at once without making your script too big
The mtcars dataset contains the variable "carb" with the number of carburetors. First I want to find out how many cars have 1, 2, 3, etc. carburetors. I used the dplyr verb count().
library(dplyr)
df <- mtcars
N <- df %>%
count(carb)
which results in:
> N
carb n
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1
Then I want to know, how many cars with 1 carb, with 2 carbs, witch 3 etc. have either 4, 6, or 8 cylinders.
For example: I used filter() to find out the total number of cars with 1 carb and 4 cylinders by using:
carb1cyl4 <- df %>%
filter(carb == 1, cyl == 4) %>%
count() %>%
rename(carb1cyl4 = n)
which results in:
carb1cyl4
1 5
I did the same for 6 and 8 cylinders with following results:
carb1cyl6
1 2
carb1cyl8
1 0
If I continue this for all carbs, I could do some _rows and _cols binding and then calculate the percentage of cars with a certain number of carbs and cyls by using mutate(carbXcylX / N), so basically dividing the amount of cars for each carb / cyl combination by the amount of cars with the corresponding number of carbs.
Problem is, my dataset is much much larger and it would take ages plus make it vulnerable to mistakes, if I would continue this route. Is there another way to calculate this?
A glimpse of the final outcome should look like this.
carb n perc1cy4 perc1cy6 perc1cy8
1 1 7 0.7142857 0.2857143 0
Thank you in advance!
Solution 1:[1]
Using table:
cbind(n = table(mtcars$carb),
prop.table(with(mtcars, table(carb, cyl)), margin = 1))
# n 4 6 8
# 1 7 0.7142857 0.2857143 0.0
# 2 10 0.6000000 0.0000000 0.4
# 3 3 0.0000000 0.0000000 1.0
# 4 10 0.0000000 0.4000000 0.6
# 6 1 0.0000000 1.0000000 0.0
# 8 1 0.0000000 0.0000000 1.0
Solution 2:[2]
What I'd probably suggest is making a group size column with something like
count_df <- df %>% count(carb, cyl) %>% rename(n = group_size)
Then you can inner join that to the table
inner_join(df, count_df, by = c("carb", "cyl")
Then calculate percentage with
mutate(perc = (n/group_size) * 100)
Solution 3:[3]
This can be made more succinct, but here's a starting point, using summarise
mtcars %>%
group_by(carb) %>%
summarise(n(),
sum(cyl == 4),
sum(cyl == 6),
sum(cyl == 8),
mean(cyl == 4),
mean(cyl == 6),
mean(cyl == 8))
#> # A tibble: 6 x 8
#> carb `n()` `sum(cyl == 4)` `sum(cyl == 6)` `sum(cyl == 8)` `mean(cyl == 4)` `mean(cyl == 6)` `mean(cyl == 8)`
#> <dbl> <int> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 1 7 5 2 0 0.714 0.286 0
#> 2 2 10 6 0 4 0.6 0 0.4
#> 3 3 3 0 0 3 0 0 1
#> 4 4 10 0 4 6 0 0.4 0.6
#> 5 6 1 0 1 0 0 1 0
#> 6 8 1 0 0 1 0 0 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | zx8754 |
| Solution 2 | Sethzard |
| Solution 3 |
