'Training random forest (ranger) using caret with custom F4 metric in R yields but after running full ,error showing undefined columns selected

library(MLmetrics)
library(caret)
library(doSNOW)
library(ranger)

data is called as the "bank additional" full from this enter link description here and then following code to generate data1

library(VIM)

data1<-hotdeck(data,variable=c('job','marital','education','default','housing','loan'),domain_var = "y",imp_var=FALSE)

#converting the categorical variables to factors as they should be
library(magrittr)
data1%<>%
  mutate_at(colnames(data1)[grepl('factor|logical|character',sapply(data1,class))],factor)

Now, splitting library(caret) #spliting data into train test 70/30 set.seed(1234) trainIndex<-createDataPartition(data1$y,p=0.7,times = 1,list = F) train<-data1[trainIndex,-11] test<-data1[-trainIndex,-11]

levels(train$y)

train$y = as.factor(train$y)
# train$y = factor(train$y,levels = c("yes","no"))

# train$y = relevel(train$y,ref="yes")

Here, i got an idea of how to create F1 metric in Training Model in Caret Using F1 Metric and using fbeta score formula i created f1_val; now i can't understand what lev,obs and pred are indicating . in my train dataset only column y showing data$obs , but no data$pred . So, is following error is due to this? and how to rectify this?

f1 <- function (data, lev = NULL, model = NULL) {
  precision <- precision(data$obs,data$pred)
  recall  <- sensitivity(data$obs,data$pred)
  f1_val <- (17*precision*recall)/(16*precision+recall)
  names(f1_val) <- c("F1")
  f1_val
}



tgrid <- expand.grid(
  .mtry = 1:5,
  .splitrule = "gini",
  .min.node.size = seq(1,500,75)
)


model_caret <- train(train$y~., data = train,
                     method = "ranger",
                     trControl = trainControl(method="cv", 
                                              number = 2, 
                                              verboseIter = T,
                                              classProbs = T,
                                              summaryFunction = f1),
                     tuneGrid = tgrid,
                     num.trees = 500,
                     importance = "impurity",
                     metric = "F1")

After running for 3/4 minutes we get following :

Aggregating results
Selecting tuning parameters
Fitting mtry = 5, splitrule = gini, min.node.size = 1 on full training set

but error:

Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) : 
  undefined columns selected

Also when running model_caret we get,

Error: object 'model_caret' not found

Kindly help. Thanks in advance



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source