'R: How to apply customized function rowwise to dataframe (avoiding for loop)

I need help to rewrite my function (see below called randomdraws()) that operates now through a repeat loop and a for loop. This does take a lot of time (especially the for loop) for my bigger datasets. Additionally I need to repeat this function x-times and want to store the results in a list object.

Here is what I am trying to achieve: I have two dataframes (here df_1 and df_2) which I need as an input for my function randomdraws(). The interesting part of this function begins in the repeat{} section; here I need to draw a number of values from the extreme value distribution (evd) equal to the length of df_1. Afterwards I need to add this values (called evd_draw) to the values of df_1 and perform a check, if this altered values fulfill a certain condition (i.e. varX==varY). If this is not the case (condition is not met with the random draw added) I want to repeat this part until the condition is met. If the condition is met, I need to store the evd_draw with which the condtion was met. I now want to iterate this over each row in my dataframe df_1. In the end I get a new dataframe with the "stored" random draws of the evd per row of df_1 that fulfilled the condition varX==varY. In my example below, for only 10 observations, my code runs just fine.

But: if the number of rows and columns of df_1 (and df_2) expand, the function randomdraws() gets very slow. I therefore need another solution that performs the calculation of the repeat loop for each row of dataframe df_1. I think I need to parallelize my computations instead of iterating over each row one after another but I seem to fail at (i) rewriting my repeat function part for this and (ii) use that in functions likewise apply()/ map()/...

QUESTION: Is there a way that I can achieve my result (i.e. a dataframe/list of the random draws that fulfilled the condition performed on dataframes df_1 and df_2) avoiding the for loop and that is quick for large datasets/dataframes?

Example data:

df_1 <- as.data.frame(rbind(c(0.23040,0.10153,0.28394,0.17105,0.00125),
                            c(0.11839,0.16768 ,0.26914 ,0.19163,0.00126),
                            c(0.11703,0.18245 ,0.16571 ,0.16223,0.00423),
                            c(0.39406,0.08535 ,0.21181 ,0.12780,0.00039),
                            c(0.16097 ,0.16369, 0.23839, 0.17830,0.00158),
                            c(0.39812 ,0.04525, 0.17583, 0.09064,0.00167),
                            c(0.30547 ,0.10900, 0.18930 ,0.12665,0.00197),
                            c(0.19357 ,0.17854, 0.18003 ,0.19576,0.00189),
                            c(0.19466 ,0.17339, 0.21267 ,0.18410,0.00069),
                            c(0.07884 ,0.21299 ,0.18480 ,0.17908,0.00178)))
colnames(df_1) <- c("xf0m40","xf30m40","xf10m40","xf20m40","xf40m0") 
rownames(df_1) <- c(2,7,21,33,50,77,80,96,102,110)

df_2 <- cbind.data.frame(varX=c("xf0m40","xf30m40","xf10m40","xf0m40","xf20m40","xf0m40","xf0m40","xf40m0","xf10m40","xf30m40"),
id=c(2,7,21,33,50,77,80,96,102,110))

Function (that runs smoothly but is too slow):

randomdraws <- function(df_1, df_2) {
    require(tidyverse)
    require(EnvStats)
    
    dfx <- df_1 #here df_1 is actually retrieved from fitted values of regression output,
                # simplified here for the sake of clarity
    df <- df_2  #select two variables from separate dataframe df_2
                #(already simplified here), where varX is a character var, id is numeric

    # matrix containing only 0; to be filled with rowwise iteration
    df_evd <- matrix(0, nrow = nrow(dfx), ncol= ncol(dfx), byrow = T)
    colnames(df_evd) <- colnames(dfx)
    rownames(df_evd) <- rownames(dfx)
    
    for (i in 1:nrow(dfx)){
            
            repeat {
                evd_draw <- revd(length(dfx), scale = .5) #draw from evd for length of one row 
                t <- as.data.frame(dfx[i,] + evd_draw) %>% bind_cols(df[i,]) %>%
                    mutate(varY=as.character(pmap(across(1:ncol(dfx)),~ names(c(...)[which.max(c(...))]))),
                           overlap=ifelse(varX == varY,1,0))
                #object t should sum row i values of dfx and evd_draw, then add varX and id from 
                #df_2 and calculate new varY to check if varX==varY

                df_evd[i,] <- evd_draw 
                
                if (t[,ncol(t)]==1) break
                #this code section should be repeated until the condition varX==varY (in
                #other words; overlap==1 or t[,ncol(t)]==1 is true
            }
        }
        
    return(df_evd)
    
}

Apply function on data:

system.time(exampledf <- randomdraws(df_1, df_2))

#replicate this function 3 times (takes even longer then!)
ls_example <- replicate(3, list(as.data.frame(randomdraws(df_1, df_2))), simplify=TRUE)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source