'how to calculate many percentages at once without making your script too big

The mtcars dataset contains the variable "carb" with the number of carburetors. First I want to find out how many cars have 1, 2, 3, etc. carburetors. I used the dplyr verb count().

library(dplyr)

df <- mtcars 

N <- df %>%
  count(carb)

which results in:

Then I want to know, how many cars with 1 carb, with 2 carbs, witch 3 etc. have either 4, 6, or 8 cylinders.

For example: I used filter() to find out the total number of cars with 1 carb and 4 cylinders by using:

carb1cyl4 <- df %>%
  filter(carb == 1, cyl == 4) %>%
  count() %>%
  rename(carb1cyl4 = n)

which results in:

  carb1cyl4
1         5

I did the same for 6 and 8 cylinders with following results:


  carb1cyl6
1         2
  carb1cyl8
1         0

If I continue this for all carbs, I could do some _rows and _cols binding and then calculate the percentage of cars with a certain number of carbs and cyls by using mutate(carbXcylX / N), so basically dividing the amount of cars for each carb / cyl combination by the amount of cars with the corresponding number of carbs.

Problem is, my dataset is much much larger and it would take ages plus make it vulnerable to mistakes, if I would continue this route. Is there another way to calculate this?

A glimpse of the final outcome should look like this.

  carb  n  perc1cy4  perc1cy6 perc1cy8
1    1  7 0.7142857 0.2857143        0

Thank you in advance!

r dplyr percentage data-wrangling

Solution 1:^[1]

Using table:

cbind(n = table(mtcars$carb),
      prop.table(with(mtcars, table(carb, cyl)), margin = 1))
#    n         4         6   8
# 1  7 0.7142857 0.2857143 0.0
# 2 10 0.6000000 0.0000000 0.4
# 3  3 0.0000000 0.0000000 1.0
# 4 10 0.0000000 0.4000000 0.6
# 6  1 0.0000000 1.0000000 0.0
# 8  1 0.0000000 0.0000000 1.0

Solution 2:^[2]

What I'd probably suggest is making a group size column with something like

count_df <- df %>% count(carb, cyl) %>% rename(n = group_size)

Then you can inner join that to the table

inner_join(df, count_df, by = c("carb", "cyl")

Then calculate percentage with

mutate(perc = (n/group_size) * 100)

Solution 3:^[3]

This can be made more succinct, but here's a starting point, using summarise

mtcars %>%
  group_by(carb) %>%
  summarise(n(),
            sum(cyl == 4),
            sum(cyl == 6),
            sum(cyl == 8),
            mean(cyl == 4),
            mean(cyl == 6),
            mean(cyl == 8))

#> # A tibble: 6 x 8
#>    carb `n()` `sum(cyl == 4)` `sum(cyl == 6)` `sum(cyl == 8)` `mean(cyl == 4)` `mean(cyl == 6)` `mean(cyl == 8)`
#>   <dbl> <int>           <int>           <int>           <int>            <dbl>            <dbl>            <dbl>
#> 1     1     7               5               2               0            0.714            0.286              0  
#> 2     2    10               6               0               4            0.6              0                  0.4
#> 3     3     3               0               0               3            0                0                  1  
#> 4     4    10               0               4               6            0                0.4                0.6
#> 5     6     1               0               1               0            0                1                  0  
#> 6     8     1               0               0               1            0                0                  1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	zx8754
Solution 2	Sethzard
Solution 3

'how to calculate many percentages at once without making your script too big

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]