'Creating a loop in R for a function

I would like to create for loop to repeat the same function for 150 variables. I am new to R and I am a bit stuck.

To give you an example of some commands I need to repeat:

N <- table(df$ var1 ==0)["TRUE"]
n <- table(df$ var1 ==1)["TRUE"]
PREV95 <- (svyciprop(~ var1 ==1, level=0.95,  design= design, deff= "replace")*100)

I need to run the same functions for 150 columns. I know that I need to put all my cols in one vector = x but then I don't know how to write the loop to repeat the same command for all my variables.

Can anyone help me to write a loop?

r


Solution 1:[1]

A word in advance: loops in R can in most cases be replaced with a faster, R-ish way (various flavours of apply, maping, walking ...)

applying a function to the columns of dataframe df:

a) with base R, example dataset cars

my_function <- function(xs) max(xs)
lapply(cars, my_function)

b) tidyverse-style:

cars %>% 
   summarise_all(my_function)

An anecdotal example: I came across an R-script which took about half an hour to complete and made abundant use of for-loops. Replacing the loops with vectorized functions and members of the apply family cut the execution time down to about 3 minutes. So while for-loops and related constructs might be more familiar when coming from another language, they might soon get in your way with R.

This chapter of Hadley Wickham's R for data science gives an introduction into iterating "the R-way".

Solution 2:[2]

Here is an approach that doesn't use loops. I've created a data set called df with three factor variables to represent your dataset as you described it. I created a function eval() that does all the work. First, it filters out just the factors. Then it converts your factors to numeric variables so that the numbers can be summed as 0 and 1 otherwise if we sum the factors it would be based on 1 and 2. Within the function I create another function neg() to give you the number of negative values by subtracting the sum of the 1s from the total length of the vector. Then create the dataframes "n" (sum of the positives), "N" (sum of the negatives), and PREV95. I used pivot_longer to get the data in a long format so that each stat you are looking for will be in its own column when merged together. Note I had to leave PREV95 out because I do not have a 'design' object to use as a parameter to run the function. I hashed it out but you can remove the hash to add back in. I then used left_join to combine these dataframes and return "results". Again, I've hashed out the version that you'd use to include PREV95. The function eval() takes your original dataframe as input. I think the logic for PREV95 should work, but I cannot check it without a 'design' parameter. It returns a dataframe, not a list, which you'll likely find easier to work with.

library(dplyr)
library(tidyr)

seed(100)
df <- data.frame(Var1 = factor(sample(c(0,1), 10, TRUE)),
                 Var2 = factor(sample(c(0,1), 10, TRUE)),
                 Var3 = factor(sample(c(0,1), 10, TRUE)))

eval <- function(df){
    
    df1 <- df %>%
        select_if(is.factor) %>%
        mutate_all(function(x) as.numeric(as.character(x)))
    
    neg <- function(x){
        length(x) - sum(x)
    }
    
    n<- df1 %>%
        summarize(across(where(is.numeric), sum)) %>%
        pivot_longer(everything(), names_to = "Var", values_to = "n")
    
    N <- df1 %>%
        summarize(across(where(is.numeric), function(x) neg(x))) %>%
        pivot_longer(everything(), names_to = "Var", values_to = "N")
    
    #PREV95 <- df1 %>%
    #    summarize(across(where(is.numeric), function(x) survey::svyciprop(~x == 1, design = design, level = 0.95,  deff = "replace")*100)) %>%
    #    pivot_longer(everything(), names_to = "Var", values_to = "PREV95")
    
    results <- n %>%
        left_join(N, by = "Var") 
    
    #results <- n %>%
    #    left_join(N, by = "Var") %>%
    #    left_join(PREV95, by = "Var")
    
    return(results)
    
}


eval(df)

  Var       n     N
  <chr> <dbl> <dbl>
1 Var1      2     8
2 Var2      5     5
3 Var3      4     6

Solution 3:[3]

If you really wanted to use a for loop, here is how to make it work. Again, I've left out the survey function due to a lack of info on the parameters to make it work.

seed(100)
df <- data.frame(Var1 = factor(sample(c(0,1), 10, TRUE)),
                 Var2 = factor(sample(c(0,1), 10, TRUE)),
                 Var3 = factor(sample(c(0,1), 10, TRUE)))

VarList <- names(df %>% select_if(is.factor))

results <- list()

for (var in VarList){
    results[[var]][["n"]] <- sum(df[[var]] == 1)
    results[[var]][["N"]] <- sum(df[[var]] == 0)
}


unlist(results)
Var1.n Var1.N Var2.n Var2.N Var3.n Var3.N 
     2      8      5      5      4      6 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 I_O
Solution 2
Solution 3 stomper