'Add margin row totals in dplyr chain

I would like to add overall summary rows while also calculating summaries by group using dplyr. I have found various questions asking how to do this, e.g. here, here, and here, but no clear solution. One possible approach is to perform count twice and bind the rows:

mtcars %>% 
  count(cyl, gear) %>% 
  bind_rows(
    count(mtcars, gear)
  )

which nearly produces what I need (the left-most column has NAs rather than 'Total' or similar):

     cyl  gear     n
   <dbl> <dbl> <int>
1      4     3     1
2      4     4     8
3      4     5     2
4      6     3     2
5      6     4     4
6      6     5     1
7      8     3    12
8      8     5     2
9     NA     3    15
10    NA     4    12
11    NA     5     5

Am I missing an easier/built-in solution?



Solution 1:[1]

With adorn_totals() from the janitor package:

library(janitor)
mtcars %>%
  tabyl(cyl, gear) %>%
  adorn_totals("row") 

   cyl  3  4 5
     4  1  8 2
     6  2  4 1
     8 12  0 2
 Total 15 12 5

To get from there to the "long" form in your post, add tidyr::gather() to the pipeline:

mtcars %>%
  tabyl(cyl, gear) %>%
  adorn_totals("row") %>%
  tidyr::gather(gear, n, 2:ncol(.), convert = TRUE)

     cyl gear  n
1      4    3  1
2      6    3  2
3      8    3 12
4  Total    3 15
5      4    4  8
6      6    4  4
7      8    4  0
8  Total    4 12
9      4    5  2
10     6    5  1
11     8    5  2
12 Total    5  5

Self-promotion alert, I authored this package - adding this answer b/c it's a genuinely efficient solution here.

Solution 2:[2]

One option is with do

mtcars %>%
   count(cyl, gear) %>%
   ungroup() %>% 
   mutate(cyl=as.character(cyl)) %>% 
   do(bind_rows(., data.frame(cyl="Total", count(mtcars, gear)))) 
   #or replace the last 'do' step with 
   #bind_rows(cbind(cyl='Total', count(mtcars, gear))) #from  @JonnyPolonsky's comments

#      cyl  gear     n
#   <chr> <dbl> <int>
#1      4     3     1
#2      4     4     8
#3      4     5     2
#4      6     3     2
#5      6     4     4
#6      6     5     1
#7      8     3    12
#8      8     5     2
#9  Total     3    15
#10 Total     4    12
#11 Total     5     5

Solution 3:[3]

An addition to @arkrun's answer that is not easy to add as a comment:

Although a little more complex, this format allows for previous modifications in the data frame. Useful when there is a longer chain of verbs before the table is generated. (You want to change names, or select only specific variables)

mtcars %>%
   count(cyl, gear) %>%
   ungroup() %>% 
   mutate(cyl=as.character(cyl))
bind_rows(group_by(.,gear) %>%
              summarise(n=sum(n)) %>%
              mutate(cyl='Total')) %>%
spread(cyl)

## A tibble: 3 x 5
#   gear   `4`   `6`   `8` Total
#* <dbl> <dbl> <dbl> <dbl> <dbl>
#1     3     1     2    12    15
#2     4     8     4     0    12
#3     5     2     1     2     5

This can also be doubled up to generate a total row for the spread as well.

mtcars %>%
  count(cyl, gear) %>%
  ungroup() %>% 
  mutate(cyl=as.character(cyl),
         gear = as.character(gear)) %>%
  bind_rows(group_by(.,gear) %>%
              summarise(n=sum(n)) %>%
              mutate(cyl='Total')) %>%
  bind_rows(group_by(.,cyl) %>%
              summarise(n=sum(n)) %>%
              mutate(gear='Total')) %>%
  spread(cyl,n,fill=0)

# A tibble: 4 x 5
   gear   `4`   `6`   `8` Total
* <chr> <dbl> <dbl> <dbl> <dbl>
1     3     1     2    12    15
2     4     8     4     0    12
3     5     2     1     2     5
4 Total    11     7    14    32

Solution 4:[4]

Here's a take on the accepted answer, using new functions introduced in dplyr 1.0.0 and tidyr 1.0.0.

We pivot the counts using the new tidyr::pivot_wider. Then use the new dplyr::rowwise and dplyr::c_across to sum the counts for the total column.

We can also use tidyr::pivot_longer to get in desired long format.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

cyl_gear_sum <- mtcars %>%
  count(cyl, gear) %>%
  pivot_wider(names_from = gear, values_from = n, values_fill = list(n = 0)) %>%
  rowwise(cyl) %>%
  mutate(gear_total = sum(c_across()))

cyl_gear_sum
#> # A tibble: 3 x 5
#> # Rowwise:  cyl
#>     cyl   `3`   `4`   `5` gear_total
#>   <dbl> <int> <int> <int>      <int>
#> 1     4     1     8     2         11
#> 2     6     2     4     1          7
#> 3     8    12     0     2         14

# total as row
cyl_gear_sum %>% 
  pivot_longer(-cyl, names_to = "gear", values_to = "n")
#> # A tibble: 12 x 3
#>      cyl gear           n
#>    <dbl> <chr>      <int>
#>  1     4 3              1
#>  2     4 4              8
#>  3     4 5              2
#>  4     4 gear_total    11
#>  5     6 3              2
#>  6     6 4              4
#>  7     6 5              1
#>  8     6 gear_total     7
#>  9     8 3             12
#> 10     8 4              0
#> 11     8 5              2
#> 12     8 gear_total    14

Created on 2020-04-07 by the reprex package (v0.3.0)

Solution 5:[5]

If you would like to have truly universal solution you could use a combination of purrr::map_df, base::c and base::sum mtcars %>% purrr::map_df(~c(.x, sum(.x, na.rm=TRUE))) %>% tail

P.S. All columns must be numeric!

Solution 6:[6]

library(tidyverse)

#Pre-process mtcars 
mtcars_pre <-
  as_tibble(mtcars) %>% #remove rownames
  select(cyl, gear) %>% 
  count(cyl, gear) %>% #add row totals
  mutate(
    cyl = as.character(cyl) #Convert to character in order to add "Total"
  )

#> # A tibble: 8 x 3
#>   cyl    gear     n
#>   <chr> <dbl> <int>
#> 1 4         3     1
#> 2 4         4     8
#> 3 4         5     2
#> 4 6         3     2
#> 5 6         4     4
#> 6 6         5     1
#> 7 8         3    12
#> 8 8         5     2

mtcars_totals <- 
  mtcars_pre %>%
  bind_rows(
    mtcars_pre %>%
      group_by(gear) %>%
      summarise(across(where(is.numeric), ~ sum(.x, na.rm = TRUE))) %>%
      mutate("cyl" = "Total")
  ) %>% 
  arrange(
    gear
  )

#> # A tibble: 11 x 3
#>    cyl    gear     n
#>    <chr> <dbl> <int>
#>  1 4         3     1
#>  2 6         3     2
#>  3 8         3    12
#>  4 Total     3    15
#>  5 4         4     8
#>  6 6         4     4
#>  7 Total     4    12
#>  8 4         5     2
#>  9 6         5     1
#> 10 8         5     2
#> 11 Total     5     5

Created on 2021-07-13 by the reprex package (v2.0.0)

Solution 7:[7]

Here is my suggestion.

  1. Find the combination of relevant grouping variables via the powerSet function.
  2. split the data frame into a list, grouped by the powerSet of the grouping variables
  3. summarise the data frame using an appropriate summary function (e.g. mean)
  4. bind_rows the result - summaries are now NA because these columns are dropped in step 3
  5. replace NA values of the grouping variables using appropriate names.

Note. If grouping variables are numeric, they will not be dropped in step 3 - I therefore mutate them to character variables.

powerSetList <- function(df, ...) {
  rje::powerSet(x = c(...))[-1] %>% lapply(function(x, tdf = df) group_by(tdf, .dots=x)) %>% c(list(tibble(df)), .)
} 

mtcars %>% 
  mutate_at(vars("cyl", "gear"), as.character) %>%
  powerSetList("cyl", "gear") %>%
  map(~summarise_if(., is.numeric, .funs = mean)) %>%
  bind_rows() %>%
  replace_na(list(gear = "all gears",
                  cyl = "all cyls"))

Solution 8:[8]

maybe that works:

library(dplyr)
mtcars %>%
    # convert cyl column as.character
    mutate_at("cyl",as.character) %>%
    # add a copy of the origina data with cyl column = 'TOTAL'
    bind_rows(mutate(mtcars, cyl="total")) %>%
    group_by(cyl) %>% summarise_all(sum)

Solution 9:[9]

Modified jlao's code a bit:

mtcars %>%
  # convert cyl column as.character
  mutate_at("cyl",as.character) %>%
  # add a copy of the original data with cyl column = 'TOTAL'
  bind_rows(mutate(mtcars, cyl="total")) %>%
  count(cyl,gear)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sam Firke
Solution 2
Solution 3 Bishops_Guest
Solution 4 tmastny
Solution 5 Maksym Tulyuk
Solution 6 Nivel
Solution 7 mzuba
Solution 8 jlao
Solution 9 hotaka