'dplyr: put count occurrences into new variable [duplicate]

Would like to get a hand on dplyr code, but cannot figure this out. Have seen a similar issue described here for many variables (summarizing counts of a factor with dplyr and Putting rowwise counts of value occurences into new variables, how to do that in R with dplyr?), however my task is somewhat smaller.
Given a data frame, how do I count the frequency of a variable and place that in a new variable.

set.seed(9)
df <- data.frame(
    group=c(rep(1,5), rep(2,5)),
    var1=round(runif(10,1,3),0))

Then we have:

>df
   group var1
1      1    1
2      1    1
3      1    1
4      1    1
5      1    2
6      2    1
7      2    2
8      2    2
9      2    2
10     2    3

Would like a third column indicating per-group (group) how many times var1 occurs, in this example this would be: count=(4,4,4,4,1,1,3,3,3,1). I tried - without success - things like:

df %>%  group_by(group) %>% rowwise() %>% do(count = nrow(.$var1))

Explanations are very appreciated!



Solution 1:[1]

All you need to do is group your data by both columns, "group" and "var1":

df %>% group_by(group, var1) %>% mutate(count = n())
#Source: local data frame [10 x 3]
#Groups: group, var1
#
#   group var1 count
#1      1    1     4
#2      1    1     4
#3      1    1     4
#4      1    1     4
#5      1    2     1
#6      2    1     1
#7      2    2     3
#8      2    2     3
#9      2    2     3
#10     2    3     1

Edit after comment

Here's an example of how you SHOULD NOT DO IT:

df %>% group_by(group, var1) %>% do(data.frame(., count = length(.$group)))

The dplyr implementation with n() is for sure much faster, cleaner and shorter and should always be preferred over such implementations as above.

Solution 2:[2]

Perhaps this is new functionality, but it can be done with one dplyr command:

df %>% add_count(group, var1)
   group  var1     n
 1     1     1     4
 2     1     1     4
 3     1     1     4
 4     1     1     4
 5     1     2     1
 6     2     1     1
 7     2     2     3
 8     2     2     3
 9     2     2     3
10     2     3     1

Solution 3:[3]

We may use probably another handy function tally from dplyr

df %>% group_by(group, var1) %>% tally()
# Source: local data frame [5 x 3]
# Groups: group
# 
#   group var1 n
# 1     1    1 4
# 2     1    2 1
# 3     2    1 1
# 4     2    2 3
# 5     2    3 1

Solution 4:[4]

Two alternatives:

1: with base R:

# option 1:
df$count <- ave(df$var1, df$var1, df$group, FUN = length)
# option 2:
df <- transform(df, count = ave(var1, var1, group, FUN = length))

which gives:

> df
   group var1 count
1      1    1     4
2      1    1     4
3      1    1     4
4      1    1     4
5      1    2     1
6      2    1     1
7      2    2     3
8      2    2     3
9      2    2     3
10     2    3     1

2: with :

library(data.table)
setDT(df)[, count := .N, by = .(group, var1)]

which gives the same result:

> df
    group var1 count
 1:     1    1     4
 2:     1    1     4
 3:     1    1     4
 4:     1    1     4
 5:     1    2     1
 6:     2    1     1
 7:     2    2     3
 8:     2    2     3
 9:     2    2     3
10:     2    3     1

If you want to summarise, you can use:

# with base R:
aggregate(id ~ group + var1, transform(df, id = 1), length)

# with 'dplyr':
count(df, group, var1)

# with 'data.table':
setDT(df)[, .N, by = .(group, var1)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Black Adder
Solution 3 KFB
Solution 4