'Mean and Standard Deviation of x>=5 of 10000 data points binomial(10, 1/4)
I have a data range of 10,000 points as per:
data = rbinom(10000, size=10, prob=1/4)
I need to find the mean and standard deviation of the data values >=5.
There are approx 766 values as per:
sum(data >=5)
sum (or any other approach I can think of) produces a TRUE/FALSE and cannot be used within a mean or sd calculation. How do I divide up the actual values?!
Solution 1:[1]
If you want to get all the values of data which are greater than or equal to 5, rather than just a logical vector telling you if the values of data are greater than or equal to 5, you need to do data[data >= 5].
So we can do:
data = rbinom(10000, size=10, prob=1/4)
mean(data[data >= 5])
#> [1] 5.298153
sd(data[data >= 5])
#> [1] 0.5567141
Solution 2:[2]
Maybe try this:
library(dplyr)
data %>%
as.data.frame() %>%
filter(. >= 5) %>%
summarise(mean = mean(.),
sd = sd(.))
Output:
mean sd
1 5.297092 0.5815554
Data
data = rbinom(10000, size=10, prob=1/4)
Solution 3:[3]
The TRUE and FALSE values can be used in mean(), sum(), sd(), etc... as they have numerical values 0 and 1, respectively.
set.seed(456)
data = rbinom(10000, size=10, prob=1/4)
mean(data >= 5)
#> [1] 0.0779
sum(data >= 5)
#> [1] 779
sd(data >= 5)
#> [1] 0.2680276
Created on 2022-05-14 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Quinten |
| Solution 3 |
