'data frame column total in R

I have data like this (derived using the table() function):

dat <- read.table(text = "responses freq percent
A           9   25.7
B          13   37.1
C          10   28.6
D           3    8.6", header = TRUE)

dat

responses freq percent
A           9   25.7
B          13   37.1
C          10   28.6
D           3    8.6

All I want are row totals, so to create a new row at the bottom that says Total and then in column freq it will show 35 and in percent it will show 100. I am unable to find a solution. colSums doesn't work because of the first column which is a string.

r


Solution 1:[1]

One option is converting to 'matrix' and using addmargins to get the column sum as a separate row at the bottom. But, this will be a matrix.

  m1 <- as.matrix(df1[-1])
  rownames(m1) <- df1[,1]
  res <- addmargins(m1, 1)
  res
  #    freq percent
  #A      9    25.7
  #B     13    37.1
  #C     10    28.6
  #D      3     8.6
  #Sum   35   100.0

If you want to convert to data.frame

data.frame(responses=rownames(res), res)

Another option would be getting the sum with colSums for the numeric columns (df1[-1]) (I think here is where the OP got into trouble, ie. applying the colSums on the entire dataset instead of subsetting), create a new data.frame with the responses column and rbind with the original dataset.

 rbind(df1, data.frame(responses='Total', as.list(colSums(df1[-1]))))
 #    responses freq percent
 #1         A    9    25.7
 #2         B   13    37.1
 #3         C   10    28.6
 #4         D    3     8.6
 #5     Total   35   100.0

data

df1 <- structure(list(responses = c("A", "B", "C", "D"), freq = c(9L, 
13L, 10L, 3L), percent = c(25.7, 37.1, 28.6, 8.6)), 
.Names = c("responses", "freq", "percent"), class = "data.frame", 
 row.names = c(NA, -4L))

Solution 2:[2]

This might be relevant, using SciencesPo package, see this example:

library(SciencesPo)
tab(mtcars,gear,cyl) 

#output

=================================
              cyl                
      --------------------       
gear    4      6      8    Total 
---------------------------------
3          1      2     12     15
        6.7%    13%    80%   100%
4          8      4      0     12
       66.7%    33%     0%   100%
5          2      1      2      5
       40.0%    20%    40%   100%
---------------------------------
Total     11      7     14     32
       34.4%    22%    44%   100%
=================================

Chi-Square Test for Independence

Number of cases in table: 32 
Number of factors: 2 
Test for independence of all factors:
    Chisq = 18.036, df = 4, p-value = 0.001214
    Chi-squared approximation may be incorrect
                    X^2 df   P(> X^2)
Likelihood Ratio 23.260  4 0.00011233
Pearson          18.036  4 0.00121407

Phi-Coefficient   : NA 
Contingency Coeff.: 0.6 
Cramer's V        : 0.531 

Solution 3:[3]

@akrun I posted it but you already did the same. Correct me if I'm wrong, I think we can just need this without creating a new data frame or using as.list.

rbind(df1, c("Total", colSums(df1[-1])))

Output:

 responses freq percent
1         A    9    25.7
2         B   13    37.1
3         C   10    28.6
4         D    3     8.6
5     Total   35     100

sqldf

Classes of the data frame are preserved.
library(sqldf)
sqldf("SELECT * FROM df1
      UNION 
      SELECT 'Total', SUM(freq) AS freq, SUM(percent) AS percent FROM df1") 

Solution 4:[4]

Or, alternatively you can use margin.table and rbind function within R-base. Two lines and voila...

PS: The lines here are longer as I am recreating the data, but you know what I mean :-)

Data

df1 <- matrix(c(9,25.7,13,37.1,10,28.6,3,8.6),ncol=2,byrow=TRUE)
colnames(df1) <- c("freq","percent") 
rownames(df1) <- c("A","B","C","D") 

Creating Total Calculation

Total <- margin.table(df1,2)

Combining Total Calculation to Original Data

df2 <- rbind(df,Total)
df2

Solution 5:[5]

Inelegant but it gets the job done, please provide reproducible data frames so we don't have to build them first:

data = data.frame(letters[1:4], c(9,13,10,3), c(25.7,37.1, 28.6, 8.6))
colnames(data) = c("X","Y","Z")

data = rbind(data[,1:3], matrix(c("Sum",lapply(data[,2:3], sum)), nrow = 1)[,1:3])

Solution 6:[6]

library(janitor)
dat %>%
  adorn_totals("row")

 responses freq percent
         A    9    25.7
         B   13    37.1
         C   10    28.6
         D    3     8.6
     Total   35   100.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Community
Solution 2 zx8754
Solution 3
Solution 4 greta_mj
Solution 5 rg255
Solution 6 Sam Firke