'Replacing a for-loop for conditional if-statement based assignments

I have been trying to write a function that would check row-by-row a huge data frame using a for-loop and if-statements, and then would append the row to a new data frame. This is the general structure of the code I've been using:

vgm.bulk <- data.frame()
for(i in 1:nrow(data.df)) {
  if(data.df$bestCHit[i] == "IGH" | data.df$bestCHit)[i] == "") {
    local.df <- data.frame( 
    VDJ_vgene = data.df$vgene[i],  
    VJ_vgene = NA)
    vgm.bulk <- rbind(vgm.bulk, local.df)
  } else if (data.df$bestCHit[i] == "IGL" | data.df$bestCHit[i] == "IGK" {
        local.df <- data.frame( 
        VDJ_vgene = NA,  
        VJ_vgene = data.df$vgene[i])
        vgm.bulk <- rbind(vgm.bulk, local.df)
}

In reality, I have to do this conditional assignment with more than 30 columns. While the code works, I found that my runtime with this approach of checking every row sequentially is far too slow and not usable (taking ~20 minutes for a dataset with 25000 rows).

Therefore, is there a way to forego using the for-loop and do the assignment more efficiently? I'd appreciate any advice towards this.



Solution 1:[1]

Here is a vectorized version of the for loop above. No need for a loop nor for rbind.

VDJ_gene <- rep(NA, nrow(data.df))
VJ_gene <- rep(NA, nrow(data.df))

i <- data.df$bestCHit %in% c("IGH", "")
VDJ_gene[i] <- data.df$bestCHit[i]

i <- data.df$bestCHit %in% c("IGL", "IGK")
VJ_gene[i] <- data.df$bestCHit[i]

vgm.bulk <- data.frame(VDJ_gene, VJ_gene)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rui Barradas