'Get dplyr count of distinct in a readable way
I'm new using dplyr, I need to calculate the distinct values in a group. Here's a table example:
data=data.frame(aa=c(1,2,3,4,NA), bb=c('a', 'b', 'a', 'c', 'c'))
I know I can do things like:
by_bb<-group_by(data, bb, add = TRUE)
summarise(by_bb, mean(aa, na.rm=TRUE), max(aa), sum(!is.na(aa)), length(aa))
But if I want the count of unique elements?
I can do:
> summarise(by_bb,length(unique(unlist(aa))))
bb length(unique(unlist(aa)))
1 a 2
2 b 1
3 c 2
and if I want to exclude NAs I cand do:
> summarise(by_bb,length(unique(unlist(aa[!is.na(aa)]))))
bb length(unique(unlist(aa[!is.na(aa)])))
1 a 2
2 b 1
3 c 1
But it's a little unreadable for me. Is there a better way to do this kind of summarization?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
