'data frame column total in R
I have data like this (derived using the table() function):
dat <- read.table(text = "responses freq percent
A 9 25.7
B 13 37.1
C 10 28.6
D 3 8.6", header = TRUE)
dat
responses freq percent
A 9 25.7
B 13 37.1
C 10 28.6
D 3 8.6
All I want are row totals, so to create a new row at the bottom that says Total and then in column freq it will show 35 and in percent it will show 100. I am unable to find a solution. colSums doesn't work because of the first column which is a string.
Solution 1:[1]
One option is converting to 'matrix' and using addmargins to get the column sum as a separate row at the bottom. But, this will be a matrix.
m1 <- as.matrix(df1[-1])
rownames(m1) <- df1[,1]
res <- addmargins(m1, 1)
res
# freq percent
#A 9 25.7
#B 13 37.1
#C 10 28.6
#D 3 8.6
#Sum 35 100.0
If you want to convert to data.frame
data.frame(responses=rownames(res), res)
Another option would be getting the sum with colSums for the numeric columns (df1[-1]) (I think here is where the OP got into trouble, ie. applying the colSums on the entire dataset instead of subsetting), create a new data.frame with the responses column and rbind with the original dataset.
rbind(df1, data.frame(responses='Total', as.list(colSums(df1[-1]))))
# responses freq percent
#1 A 9 25.7
#2 B 13 37.1
#3 C 10 28.6
#4 D 3 8.6
#5 Total 35 100.0
data
df1 <- structure(list(responses = c("A", "B", "C", "D"), freq = c(9L,
13L, 10L, 3L), percent = c(25.7, 37.1, 28.6, 8.6)),
.Names = c("responses", "freq", "percent"), class = "data.frame",
row.names = c(NA, -4L))
Solution 2:[2]
This might be relevant, using SciencesPo package, see this example:
library(SciencesPo)
tab(mtcars,gear,cyl)
#output
=================================
cyl
--------------------
gear 4 6 8 Total
---------------------------------
3 1 2 12 15
6.7% 13% 80% 100%
4 8 4 0 12
66.7% 33% 0% 100%
5 2 1 2 5
40.0% 20% 40% 100%
---------------------------------
Total 11 7 14 32
34.4% 22% 44% 100%
=================================
Chi-Square Test for Independence
Number of cases in table: 32
Number of factors: 2
Test for independence of all factors:
Chisq = 18.036, df = 4, p-value = 0.001214
Chi-squared approximation may be incorrect
X^2 df P(> X^2)
Likelihood Ratio 23.260 4 0.00011233
Pearson 18.036 4 0.00121407
Phi-Coefficient : NA
Contingency Coeff.: 0.6
Cramer's V : 0.531
Solution 3:[3]
@akrun I posted it but you already did the same. Correct me if I'm wrong, I think we can just need this without creating a new data frame or using as.list.
rbind(df1, c("Total", colSums(df1[-1])))
Output:
responses freq percent
1 A 9 25.7
2 B 13 37.1
3 C 10 28.6
4 D 3 8.6
5 Total 35 100
sqldf
Classes of the data frame are preserved.library(sqldf)
sqldf("SELECT * FROM df1
UNION
SELECT 'Total', SUM(freq) AS freq, SUM(percent) AS percent FROM df1")
Solution 4:[4]
Or, alternatively you can use margin.table and rbind function within R-base. Two lines and voila...
PS: The lines here are longer as I am recreating the data, but you know what I mean :-)
Data
df1 <- matrix(c(9,25.7,13,37.1,10,28.6,3,8.6),ncol=2,byrow=TRUE)
colnames(df1) <- c("freq","percent")
rownames(df1) <- c("A","B","C","D")
Creating Total Calculation
Total <- margin.table(df1,2)
Combining Total Calculation to Original Data
df2 <- rbind(df,Total)
df2
Solution 5:[5]
Inelegant but it gets the job done, please provide reproducible data frames so we don't have to build them first:
data = data.frame(letters[1:4], c(9,13,10,3), c(25.7,37.1, 28.6, 8.6))
colnames(data) = c("X","Y","Z")
data = rbind(data[,1:3], matrix(c("Sum",lapply(data[,2:3], sum)), nrow = 1)[,1:3])
Solution 6:[6]
library(janitor)
dat %>%
adorn_totals("row")
responses freq percent
A 9 25.7
B 13 37.1
C 10 28.6
D 3 8.6
Total 35 100.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Community |
| Solution 2 | zx8754 |
| Solution 3 | |
| Solution 4 | greta_mj |
| Solution 5 | rg255 |
| Solution 6 | Sam Firke |
