'Mean by factor by level
Maybe this is simple but I can't find answer on web. I have problem with mean calculation by factors by level. My data looks typicaly:
factor, value
a,1
a,2
b,1
b,1
b,1
c,1
I want to get vector A contains mean only for level "a" If I type A on consol I want to get 1.5 And this method for calculating mean, must use factors.
Thank you in advance for help.
Solution 1:[1]
take a look at tapply, which lets you break up a vector according to a factor(s) and apply a function to each subset
> dat<-data.frame(factor=sample(c("a","b","c"), 10, T), value=rnorm(10))
> r1<-with(dat, tapply(value, factor, mean))
> r1
a b c
0.3877001 -0.4079463 -1.0837449
> r1[["a"]]
[1] 0.3877001
You can access your results using r1[["a"]] etc.
Alternatively, one of the popular R packages (plyr) has very nice ways of doing this.
> library(plyr)
> r2<-ddply(dat, .(factor), summarize, mean=mean(value))
> r2
factor mean
1 a 0.3877001
2 b -0.4079463
3 c -1.0837449
> subset(r2,factor=="a",select="mean")
mean
1 0.3877001
You can also use dlply instead (which takes a dataframe and returns a list instead)
> dlply(dat, .(factor), summarize, mean=mean(value))$a
mean
1 0.3877001
Solution 2:[2]
The following code asks for the mean of value when factor = a:
mean(data$value[data$factor == "a"])
Solution 3:[3]
Another simple possibilty would be the "by" function:
by(value, factor, mean)
You can get the mean of factor level "a" by:
factor_means <- by(value, factor, mean)
factor_means[attr(factor_means, "dimnames")$factor=="a"]
Solution 4:[4]
You can use ddply and pass summary as the function.
library(plyr) # import library
ddply(nameOfTheDataframe, ~ factor, function(data) summary(data$value))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Thomas |
| Solution 3 | Ruediger Ziege |
| Solution 4 | noone |
