'More efficiency creating a new variable using for loop

I would like to know if there is a more efficient way to do this since I have a millions-long dataset that has been stuck for days in this step.

for (i in 1:32000000){
    if (!exists("df")){
     df <- as.data.frame(Properties[[i]])
     df <- as.data.frame(t(df))
   }else{
     temp_dataset <- as.data.frame(Properties[[i]])
     temp_dataset <- as.data.frame(t(temp_dataset))
     df <- rbind(df, temp_dataset)
      rm(temp_dataset)
   }
}

Basically what is doing is to create a new variable and add new rows as the variable i progress through 1:32000000. But, as I said it takes a lot of time so I need a more efficient way to do it.

Properties looks like:

List of 32000000
 $ : Named num [1:3] -0.85 -0.544 0.208
  ..- attr(*, "names")= chr [1:3] "PP1" "PP2" "PP3"
 $ : Named num [1:3] -0.332 -0.698 0.264
  ..- attr(*, "names")= chr [1:3] "PP1" "PP2" "PP3"
 $ : Named num [1:3] -0.768 -0.486 0.184
  ..- attr(*, "names")= chr [1:3] "PP1" "PP2" "PP3"
 $ : Named num [1:3] -0.458 -0.57 -0.054
  ..- attr(*, "names")= chr [1:3] "PP1" "PP2" "PP3"
 $ : Named num [1:3] -0.536 -0.458 0.348
  ..- attr(*, "names")= chr [1:3] "PP1" "PP2" "PP3"
 $ : Named num [1:3] -0.47 -0.776 0.06


Solution 1:[1]

You can try using transpose() from data.table. This should be pretty fast.

Sample data:

n <- 100000

Properties <- replicate(n, setNames(runif(3), c("PP1", "PP2", "PP3")), simplify = FALSE)

head(Properties, 3)

# [[1]]
#       PP1       PP2       PP3 
# 0.8036237 0.9423731 0.9593770 
# 
# [[2]]
#       PP1       PP2       PP3 
# 0.1906879 0.5571697 0.9718734 
# 
# [[3]]
#       PP1       PP2       PP3 
# 0.7542362 0.3420677 0.4541527

Stacking code:

df <- as.data.frame(data.table::transpose(Properties),
                    col.names = c("PP1", "PP2", "PP3"))

Benchmark:

microbenchmark::microbenchmark(
  do.call = do.call(rbind, Properties),
  data.table = as.data.frame(data.table::transpose(Properties),
                             col.names = c("PP1", "PP2", "PP3")))

# Unit: milliseconds
#        expr     min       lq       mean   median        uq      max neval
#     do.call 74.2183 83.29040 107.001017 96.63925 113.61070 322.4556   100
#  data.table  4.6864  5.06845   6.163916  5.30285   5.56845  73.3627   100

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Adam