'Add predictors one by one in random forest

once more I will need your help in order to solve a syntax problem and I thank you for that. So I have a dataset that looks like that :

y <- rnorm(1000)
x1 <- rnorm(1000) + 0.2 * y
x2 <- rnorm(1000) + 0.2 * x1 + 0.1 * y
x3 <- rnorm(1000) - 0.1 * x1 + 0.3 * x2 - 0.3 * y
data <- data.frame(y, x1, x2, x3)
head(data)  

                         # 

I need a loop to run a random forest starting with one predictor and adding all the predictors one by one each time, like that:

randomForest(y ~ x1, data= data)
randomForest(y ~ x1 + x2, data= data)
randomForest(y ~ x1 + x2 + x3, data=data) etc...

Would you be kind enough to help me? Thank you in advance!



Solution 1:[1]

You can build the formula, and use as.formula()

lapply(1:3, \(i) {
  formula = as.formula(paste0("y~",paste0("x",1:i, collapse="+")))
  randomForest(formula, data=data)
})

A more general approach, for example if the predictors were not consistently named, or without specifying how many there are, would be to obtain a string vector of the predictors, say using colnames(), and adjust the loop slightly

predictors = colnames(data[,-1])

lapply(1:length(predictors), \(i) {
  formula = as.formula(paste0("y~",paste0(predictors[1:i], collapse="+")))
  randomForest(formula, data=data)
})

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1