'Running analysis on for loop x times
I have the following code that selects 4 rows of iris 1000x, and takes the mean of each 4 row sample:
library(dplyr)
iris<- iris
storage<- list()
counter<- 0
for (i in 1:1000) {
# sample 3 randomly selected transects 100 time
tempsample<- iris[sample(1:nrow(iris), 4, replace=F),]
storage[[i]]=tempsample
counter<- counter+1
print(counter)
}
# Unpack results into dataframe
results<- do.call(rbind, storage)
View(results)
results_2<- as.data.frame(results)
results_2<- results_2 %>% mutate(Aggregate = rep(seq(1,ceiling(nrow(results_2)/4)),each = 4))
# View(results_2)
final_results<- aggregate(results_2[,1:4], list(results_2$Aggregate), mean)
# View(final_results)
I want to calculate the bias of each column in relation to their true population parameter. For example using SimDesign's bias():
library(SimDesign)
(bias(final_results[,2:5], parameter=c(5,3,2,1), type='relative'))*100
In this code, the values of parameter are hypothetical true pop. values of each column in the dataframe. I want to do this process 100x to get a distribution of bias estimates for each variable in the dataframe. However, I'm not sure how to fit all of this into a for loop (what I think would be the way to go) so the final output is a dataframe with 100 rows of bias measurements for each iris variable.
Any help with this would be greatly appreciated.
#------------------------------
Update
Trying to run the same code for a stratified sample as opposed to a random sample gives me the following error: *Error in [.data.table(setDT(copy(iris)), as.vector(sapply(1:1000, function(X) stratified(iris, :
i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT * I think this might be related to setDT?
This is a result of the following code:
do.call(rbind,lapply(1:100, function(x) {
bias(
setDT(copy(iris))[as.vector(sapply(1:1000, function(X) stratified(iris,group="Species", size=1)))][
, lapply(.SD, mean), by=rep(c(1:1000),4), .SDcols=c(1:4)][,c(2:5)],
parameter=c(5,3,2,1),
type='relative'
)
}))
I looked into using the following code which was suggested:
get_samples <- function(n, sampsize=4) {
rbindlist(lapply(1:n, function(x) {
splitstackshape::stratified(iris, group="Species",sampsize)[, id:=x] }))[
, lapply(.SD, mean), by=.(Species, id)] }
I think I understand what this function is doing (selecting 4 stratified rows of iris, taking the means of each column by species), but I'm not sure how to apply it to the original question of doing it (4*1000)*100 to test the bias (I'm very new at this so apologies if I'm missing something obvious).
Solution 1:[1]
Since you are using mutate you may consider staying with tidyverse.
map_df(1:1000, ~ sample_n(iris, 4, replace = FALSE)) %>%
glimpse() %>%
mutate(Aggregate_col = rep(seq(1, ceiling(n() / 4)), each = 4)) %>%
glimpse() %>%
select(starts_with("Sepal"),
starts_with("Petal"),
matches("Aggregate")) %>%
group_by(Aggregate_col) %>%
summarise(across(.cols = everything(), ~ mean(.x, na.rm = TRUE)))
Notes:
In the example below, your first loop is replaced by:
map_df(1:1000, ~ sample_n(iris, 4, replace = FALSE))map_xcan be used to iterate over a list, or in this case an integer vector1:1000, if the only intention is to call the function repeatedly, and binding the results into a desired format, in this case adata.frame.You can exploit
glimpsewhile within the data transformation pipeline to avoid callingViewrepeatedlyselectprovides a readable way of selecting columns by name, or partial matches. This is usually safer method than selecting column by index while adding/removing variables
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
