'Show count of unique values in datasummary and combine two different tables of descriptive statistics using data
I really like the modelsummary package and i'm trying to produce a single table that mixes descriptive statistics of different types. The first part is easy: I can make basic descriptives of var2 and var3 before. I can't get the second part right, though.
- I'd like to get a count of the unique entries of the variable
var1, i.e. 26. - I'd like to be able to combine the two into one table.
var1<-rep(LETTERS, 5)
var2<-rnorm(length(var1), mean=50, sd=10)
var3<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3)
library(gr)
library(modelsummary)
#This gets the descriptives of var2 and var3
datasummary(var2+var3~Mean+SD+N, data=df)
#htis returns a long column of the number of entries of each value of var1; I would just like the number 26 here and combine it with the above
datasummary(var1~length, data=df)
Solution 1:[1]
Based on add_row (https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html#add_rows)
new_row <- data.frame('var1',
"-",
"-",
length(unique((var1))))
datasummary(var2+var3~Mean+SD+N, data=df,
add_rows = new_row)
Solution 2:[2]
Mixing factor and numeric variables in datasummary() is kind of tricky. Here are two options.
The first approach is to create a first table with output="data.frame", and to feed it to the add_rows argument of a second table, inserting “empty” columns as necessary to align the two tables:
library(modelsummary)
var1<-rep(LETTERS[1:5], 5)
var2<-rep(LETTERS[8:12], 5)
var3<-rnorm(length(var1), mean=50, sd=10)
var4<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3, var4)
# function to insert empty columns
empty <- function(...) ""
ar <- datasummary(var1 + var2 ~ empty + empty + N,
data = df,
output = "data.frame")
datasummary(var3 + var4 ~ Heading("") * empty + Mean + SD + N,
data = df,
add_rows = ar)
| Mean | SD | N | ||
|---|---|---|---|---|
| var3 | 52.66 | 9.35 | 25 | |
| var4 | 9.21 | 5.25 | 25 | |
| var1 | A | 5 | ||
| B | 5 | |||
| C | 5 | |||
| D | 5 | |||
| E | 5 | |||
| var2 | H | 5 | ||
| I | 5 | |||
| J | 5 | |||
| K | 5 | |||
| L | 5 |
The second approach is to use the datasummary_balance template function with ~1 as a formula argument. This is of course less flexible, but it works for simple cases:
datasummary_balance(~ 1, data = df)
| Mean | Std. Dev. | ||
|---|---|---|---|
| var3 | 52.7 | 9.4 | |
| var4 | 9.2 | 5.2 | |
| N | Pct. | ||
| var1 | A | 5 | 20.0 |
| B | 5 | 20.0 | |
| C | 5 | 20.0 | |
| D | 5 | 20.0 | |
| E | 5 | 20.0 | |
| var2 | H | 5 | 20.0 |
| I | 5 | 20.0 | |
| J | 5 | 20.0 | |
| K | 5 | 20.0 | |
| L | 5 | 20.0 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Julian |
| Solution 2 | Vincent |
