'Add margin row totals in dplyr chain
I would like to add overall summary rows while also calculating summaries by group using dplyr. I have found various questions asking how to do this, e.g. here, here, and here, but no clear solution. One possible approach is to perform count twice and bind the rows:
mtcars %>%
count(cyl, gear) %>%
bind_rows(
count(mtcars, gear)
)
which nearly produces what I need (the left-most column has NAs rather than 'Total' or similar):
cyl gear n
<dbl> <dbl> <int>
1 4 3 1
2 4 4 8
3 4 5 2
4 6 3 2
5 6 4 4
6 6 5 1
7 8 3 12
8 8 5 2
9 NA 3 15
10 NA 4 12
11 NA 5 5
Am I missing an easier/built-in solution?
Solution 1:[1]
With adorn_totals() from the janitor package:
library(janitor)
mtcars %>%
tabyl(cyl, gear) %>%
adorn_totals("row")
cyl 3 4 5
4 1 8 2
6 2 4 1
8 12 0 2
Total 15 12 5
To get from there to the "long" form in your post, add tidyr::gather() to the pipeline:
mtcars %>%
tabyl(cyl, gear) %>%
adorn_totals("row") %>%
tidyr::gather(gear, n, 2:ncol(.), convert = TRUE)
cyl gear n
1 4 3 1
2 6 3 2
3 8 3 12
4 Total 3 15
5 4 4 8
6 6 4 4
7 8 4 0
8 Total 4 12
9 4 5 2
10 6 5 1
11 8 5 2
12 Total 5 5
Self-promotion alert, I authored this package - adding this answer b/c it's a genuinely efficient solution here.
Solution 2:[2]
One option is with do
mtcars %>%
count(cyl, gear) %>%
ungroup() %>%
mutate(cyl=as.character(cyl)) %>%
do(bind_rows(., data.frame(cyl="Total", count(mtcars, gear))))
#or replace the last 'do' step with
#bind_rows(cbind(cyl='Total', count(mtcars, gear))) #from @JonnyPolonsky's comments
# cyl gear n
# <chr> <dbl> <int>
#1 4 3 1
#2 4 4 8
#3 4 5 2
#4 6 3 2
#5 6 4 4
#6 6 5 1
#7 8 3 12
#8 8 5 2
#9 Total 3 15
#10 Total 4 12
#11 Total 5 5
Solution 3:[3]
An addition to @arkrun's answer that is not easy to add as a comment:
Although a little more complex, this format allows for previous modifications in the data frame. Useful when there is a longer chain of verbs before the table is generated. (You want to change names, or select only specific variables)
mtcars %>%
count(cyl, gear) %>%
ungroup() %>%
mutate(cyl=as.character(cyl))
bind_rows(group_by(.,gear) %>%
summarise(n=sum(n)) %>%
mutate(cyl='Total')) %>%
spread(cyl)
## A tibble: 3 x 5
# gear `4` `6` `8` Total
#* <dbl> <dbl> <dbl> <dbl> <dbl>
#1 3 1 2 12 15
#2 4 8 4 0 12
#3 5 2 1 2 5
This can also be doubled up to generate a total row for the spread as well.
mtcars %>%
count(cyl, gear) %>%
ungroup() %>%
mutate(cyl=as.character(cyl),
gear = as.character(gear)) %>%
bind_rows(group_by(.,gear) %>%
summarise(n=sum(n)) %>%
mutate(cyl='Total')) %>%
bind_rows(group_by(.,cyl) %>%
summarise(n=sum(n)) %>%
mutate(gear='Total')) %>%
spread(cyl,n,fill=0)
# A tibble: 4 x 5
gear `4` `6` `8` Total
* <chr> <dbl> <dbl> <dbl> <dbl>
1 3 1 2 12 15
2 4 8 4 0 12
3 5 2 1 2 5
4 Total 11 7 14 32
Solution 4:[4]
Here's a take on the accepted answer, using new functions introduced in dplyr 1.0.0 and tidyr 1.0.0.
We pivot the counts using the new tidyr::pivot_wider. Then use the new dplyr::rowwise and dplyr::c_across to sum the counts for the total column.
We can also use tidyr::pivot_longer to get in desired long format.
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
cyl_gear_sum <- mtcars %>%
count(cyl, gear) %>%
pivot_wider(names_from = gear, values_from = n, values_fill = list(n = 0)) %>%
rowwise(cyl) %>%
mutate(gear_total = sum(c_across()))
cyl_gear_sum
#> # A tibble: 3 x 5
#> # Rowwise: cyl
#> cyl `3` `4` `5` gear_total
#> <dbl> <int> <int> <int> <int>
#> 1 4 1 8 2 11
#> 2 6 2 4 1 7
#> 3 8 12 0 2 14
# total as row
cyl_gear_sum %>%
pivot_longer(-cyl, names_to = "gear", values_to = "n")
#> # A tibble: 12 x 3
#> cyl gear n
#> <dbl> <chr> <int>
#> 1 4 3 1
#> 2 4 4 8
#> 3 4 5 2
#> 4 4 gear_total 11
#> 5 6 3 2
#> 6 6 4 4
#> 7 6 5 1
#> 8 6 gear_total 7
#> 9 8 3 12
#> 10 8 4 0
#> 11 8 5 2
#> 12 8 gear_total 14
Created on 2020-04-07 by the reprex package (v0.3.0)
Solution 5:[5]
If you would like to have truly universal solution you could use a combination of purrr::map_df, base::c and base::sum
mtcars %>%
purrr::map_df(~c(.x, sum(.x, na.rm=TRUE))) %>%
tail
P.S. All columns must be numeric!
Solution 6:[6]
library(tidyverse)
#Pre-process mtcars
mtcars_pre <-
as_tibble(mtcars) %>% #remove rownames
select(cyl, gear) %>%
count(cyl, gear) %>% #add row totals
mutate(
cyl = as.character(cyl) #Convert to character in order to add "Total"
)
#> # A tibble: 8 x 3
#> cyl gear n
#> <chr> <dbl> <int>
#> 1 4 3 1
#> 2 4 4 8
#> 3 4 5 2
#> 4 6 3 2
#> 5 6 4 4
#> 6 6 5 1
#> 7 8 3 12
#> 8 8 5 2
mtcars_totals <-
mtcars_pre %>%
bind_rows(
mtcars_pre %>%
group_by(gear) %>%
summarise(across(where(is.numeric), ~ sum(.x, na.rm = TRUE))) %>%
mutate("cyl" = "Total")
) %>%
arrange(
gear
)
#> # A tibble: 11 x 3
#> cyl gear n
#> <chr> <dbl> <int>
#> 1 4 3 1
#> 2 6 3 2
#> 3 8 3 12
#> 4 Total 3 15
#> 5 4 4 8
#> 6 6 4 4
#> 7 Total 4 12
#> 8 4 5 2
#> 9 6 5 1
#> 10 8 5 2
#> 11 Total 5 5
Created on 2021-07-13 by the reprex package (v2.0.0)
Solution 7:[7]
Here is my suggestion.
- Find the combination of relevant grouping variables via the powerSet function.
- split the data frame into a list, grouped by the powerSet of the grouping variables
- summarise the data frame using an appropriate summary function (e.g. mean)
- bind_rows the result - summaries are now NA because these columns are dropped in step 3
- replace NA values of the grouping variables using appropriate names.
Note. If grouping variables are numeric, they will not be dropped in step 3 - I therefore mutate them to character variables.
powerSetList <- function(df, ...) {
rje::powerSet(x = c(...))[-1] %>% lapply(function(x, tdf = df) group_by(tdf, .dots=x)) %>% c(list(tibble(df)), .)
}
mtcars %>%
mutate_at(vars("cyl", "gear"), as.character) %>%
powerSetList("cyl", "gear") %>%
map(~summarise_if(., is.numeric, .funs = mean)) %>%
bind_rows() %>%
replace_na(list(gear = "all gears",
cyl = "all cyls"))
Solution 8:[8]
maybe that works:
library(dplyr)
mtcars %>%
# convert cyl column as.character
mutate_at("cyl",as.character) %>%
# add a copy of the origina data with cyl column = 'TOTAL'
bind_rows(mutate(mtcars, cyl="total")) %>%
group_by(cyl) %>% summarise_all(sum)
Solution 9:[9]
Modified jlao's code a bit:
mtcars %>%
# convert cyl column as.character
mutate_at("cyl",as.character) %>%
# add a copy of the original data with cyl column = 'TOTAL'
bind_rows(mutate(mtcars, cyl="total")) %>%
count(cyl,gear)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sam Firke |
| Solution 2 | |
| Solution 3 | Bishops_Guest |
| Solution 4 | tmastny |
| Solution 5 | Maksym Tulyuk |
| Solution 6 | Nivel |
| Solution 7 | mzuba |
| Solution 8 | jlao |
| Solution 9 | hotaka |
