'How to count how many values per level in a given factor?

I have a data.frame mydf with about 2500 rows. These rows correspond to 69 classes of objects in colum 1 mydf$V1, and I want to count how many rows per object class I have. I can get a factor of these classes with:

objectclasses = unique(factor(mydf$V1, exclude="1"));

What's the terse R way to count the rows per object class? If this were any other language I'd be traversing an array with a loop and keeping count but I'm new to R programming and am trying to take advantage of R's vectorised operations.



Solution 1:[1]

Here 2 ways to do it:

set.seed(1)
tt <- sample(letters,100,rep=TRUE)

## using table
table(tt)
tt
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 
## using tapply
tapply(tt,tt,length)
a b c d e f g h i j k l m n o p q r s t u v w x y z 
2 3 3 3 2 4 6 1 6 5 6 4 7 2 2 2 5 4 5 3 8 4 5 4 3 1 

Solution 2:[2]

Using plyr package:

library(plyr)

count(mydf$V1)

It will return you a frequency of each value.

Solution 3:[3]

Using data.table

 library(data.table)
 setDT(dat)[, .N, keyby=ID] #(Using @Paul Hiemstra's `dat`)

Or using dplyr 0.3

 res <- count(dat, ID)
 head(res)
 #Source: local data frame [6 x 2]

 #  ID n
 #1  a 2
 #2  b 3
 #3  c 3
 #4  d 3
 #5  e 2
 #6  f 4

Or

  dat %>% 
      group_by(ID) %>% 
      tally()

Or

  dat %>% 
      group_by(ID) %>%
      summarise(n=n())

Solution 4:[4]

We can use summary on factor column:

summary(myDF$factorColumn)

Solution 5:[5]

One more approach would be to apply n() function which is counting the number of observations

library(dplyr)
library(magrittr)
data %>% 
  group_by(columnName) %>%
  summarise(Count = n())

Solution 6:[6]

In case I just want to know how many unique factor levels exist in the data, I use:

length(unique(df$factorcolumn))

Solution 7:[7]

Use the package plyr with lapply to get frequencies for every value (level) and every variable (factor) in your data frame.

library(plyr)
lapply(df, count)

Solution 8:[8]

This is an old post, but you can do this with base R and no data frames/data tables:

sapply(levels(yTrain), function(sLevel) sum(yTrain == sLevel))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 agstudy
Solution 2 zx8754
Solution 3 Arun
Solution 4 zx8754
Solution 5 iamigham
Solution 6 Peter
Solution 7 Christian Savemark
Solution 8 Victor