'Loop to make a basic table for many variables by condition

I am running an experiment where participants are randomly assigned to one of two conditions, and then I collect data on several variables. Here is an example of my code:

df <- data.frame(condition =c(1,1,1,1,1,-1,-1,-1,-1,-1),
              var1 = c(6,6,4,7,5,6,6,6,4,7),
              var2 = c(3,4,3,6,7,1,2,1,2,5),
              var3 = c(2,2,6,6,7,1,7,7,3,1),
              var4 = c(6,4,3,6,4,1,3,3,4,4))

df$condition = factor(df$condition, levels = c(-1,1),labels = c("Digital","Physical"))

For each variable (var1, var2, etc.) I would like a little table with the count, mean, and standard deviation. This code creates the kind of table that I want:

 group_by(df, df$condition) %>% 
  summarise(
    count = n(),
    mean = mean(var1),
    sd = sd(var1))

But because I have many variables, I would like to use some kind of loop (or "lapply"?) to create all these tables at once. It would also be great if each table could show the name of the variable. Thanks!

Solution 1:^[1]

df <- data.frame(condition =c(1,1,1,1,1,-1,-1,-1,-1,-1),
                 var1 = c(6,6,4,7,5,6,6,6,4,7),
                 var2 = c(3,4,3,6,7,1,2,1,2,5),
                 var3 = c(2,2,6,6,7,1,7,7,3,1),
                 var4 = c(6,4,3,6,4,1,3,3,4,4))

df$condition = factor(df$condition, levels = c(-1,1),labels = c("Digital","Physical"))





for (var in names(df)[2:length(names(df))]){

  
  tab <- group_by(df, condition) %>% 
    select(c("condition", var)) %>% 
    dplyr::rename(v = var) %>% 
    summarise(
      count = n(),
      mean = mean(v),
      sd = sd(v) 
      )
  print(var)
  print(tab)
}

gives


[1] "var1"
# A tibble: 2 × 4
  condition count  mean    sd
  <fct>     <int> <dbl> <dbl>
1 Digital       5   5.8  1.10
2 Physical      5   5.6  1.14
[1] "var2"
# A tibble: 2 × 4
  condition count  mean    sd
  <fct>     <int> <dbl> <dbl>
1 Digital       5   2.2  1.64
2 Physical      5   4.6  1.82
[1] "var3"
# A tibble: 2 × 4
  condition count  mean    sd
  <fct>     <int> <dbl> <dbl>
1 Digital       5   3.8  3.03
2 Physical      5   4.6  2.41
[1] "var4"
# A tibble: 2 × 4
  condition count  mean    sd
  <fct>     <int> <dbl> <dbl>
1 Digital       5   3    1.22
2 Physical      5   4.6  1.34
>

Solution 2:^[2]

You can just use summarise on all the variables, i.e.

library(dplyr)

group_by(df, condition) %>% 
     summarise(across(everything(), ~ c(count = n(), mean = mean(.), sd = sd(.))))

`summarise()` has grouped output by 'condition'. You can override using the `.groups` argument.
# A tibble: 6 x 5
# Groups:   condition [2]
  condition  var1  var2  var3  var4
  <fct>     <dbl> <dbl> <dbl> <dbl>
1 Digital    5     5     5     5   
2 Digital    5.8   2.2   3.8   3   
3 Digital    1.10  1.64  3.03  1.22
4 Physical   5     5     5     5   
5 Physical   5.6   4.6   4.6   4.6 
6 Physical   1.14  1.82  2.41  1.34

You can control the output structure by changing object in the formula, i.e.

group_by(df, condition) %>% 
     summarise(across(everything(), ~ data.frame(count = n(), mean = mean(.), sd = sd(.))))
# A tibble: 2 x 5
  condition var1$count $mean   $sd var2$count $mean   $sd var3$count $mean   $sd var4$count $mean   $sd
  <fct>          <int> <dbl> <dbl>      <int> <dbl> <dbl>      <int> <dbl> <dbl>      <int> <dbl> <dbl>
1 Digital            5   5.8  1.10          5   2.2  1.64          5   3.8  3.03          5   3    1.22
2 Physical           5   5.6  1.14          5   4.6  1.82          5   4.6  2.41          5   4.6  1.34

Solution 3:^[3]

We could still do it my summarise using a list:

library(dplyr)

df %>% 
  group_by(condition) %>% 
  summarise(across(starts_with("var"), .f = list(n = ~n(),
                                         mean = mean,
                                         sd = sd), na.rm = TRUE))

  condition var1_n var1_mean var1_sd var2_n var2_mean var2_sd var3_n var3_mean var3_sd var4_n var4_mean var4_sd
      <dbl>  <int>     <dbl>   <dbl>  <int>     <dbl>   <dbl>  <int>     <dbl>   <dbl>  <int>     <dbl>   <dbl>
1        -1      5       5.8    1.10      5       2.2    1.64      5       3.8    3.03      5       3      1.22
2         1      5       5.6    1.14      5       4.6    1.82      5       4.6    2.41      5       4.6    1.34

Solution 4:^[4]

Rather than lapply, the function of choice is aggregate, a close relative to the *apply family at least. Put in a custom function f.

f <- \(x) c(n=length(x), mu=mean(x), sd=sd(x))

aggregate(. ~ condition, df, f)
#   condition   var1.n  var1.mu  var1.sd   var2.n  var2.mu  var2.sd   var3.n  var3.mu  var3.sd   var4.n  var4.mu  var4.sd
# 1   Digital 5.000000 5.800000 1.095445 5.000000 2.200000 1.643168 5.000000 3.800000 3.033150 5.000000 3.000000 1.224745
# 2  Physical 5.000000 5.600000 1.140175 5.000000 4.600000 1.816590 5.000000 4.600000 2.408319 5.000000 4.600000 1.341641

If you want to aggregate on a specific set of variables (e.g. assembled with grep), use list notation instead.

aggregate(df[grep('^var', names(df))], df['condition'], f)

Solution 5:^[5]

You can use gtsummary here if you need to present the results.

Example one below will make one table with all of your variables. Example two will split each variable into its own table (if you need them to be seperate)

library(gtsummary)

#example one: 
tbl_summary(df, by = condition,
            type = list(everything()~"continuous"),
            statistic = list(all_continuous()~"{mean} ({sd}) ")) 

#example two:
tbl_summary(df, by = condition,
            type = list(everything()~"continuous"),
            statistic = list(all_continuous()~"{mean} ({sd}) ")) %>% 
      tbl_split(variables = c(var1, var2,var3,var4))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	goblinshark
Solution 2	Sotos
Solution 3	TarJae
Solution 4
Solution 5	Mike

'Loop to make a basic table for many variables by condition

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Solution 5:[5]