'r Caret Package - GLM - ERROR In Ops.factor(y, 0.5) : ‘-’ not meaningful for factors

I have following simplified dataset as example:

> str(one_year_before)
'data.frame':   3359 obs. of  3 variables:
 $ Default_status            : Factor w/ 2 levels "NO","YES": 1 1 1 2 2 1 1 1 1 1 ...
 $ Average_paydex            : num  79.6 73.3 73.3 66.4 64.9 ...
 $ Average_amount_of_defaults: num  0 0 0 0 0 0 0 0 0 0 ...

And following code:

library(MASS)
library(caret)
    set.seed(567)
# Store row numbers for training set: index_train
index_train <- createDataPartition(y = one_year_before$Default_status,
                                   p = .7, ## The percentage of data in the training set
                                   list = FALSE)

# Create training set: training_set
training_set <- one_year_before[index_train, ]

# Create test set: test_set
test_set <- one_year_before[-index_train, ]

str(training_set)

#k 10 fold  cross validation

folds <- 10
train_control <- trainControl(method = "repeatedcv", number = 10, repeats = 20, summaryFunction = twoClassSummary, 
                              classProbs = TRUE, savePredictions = T)


model <- train(Default_status~.,
            data = training_set,
            method = "glm",
            preProcess = c('center', 'scale'),
            trControl = train_control,
            metric = 'ROC')

I get the following error:

    Warning messages:
1: In Ops.factor(y, 0.5) : ‘-’ not meaningful for factors
2: model fit failed for Fold01.Rep01: parameter=none Error in glm(formula = .outcome ~ ., data = structure(list(Average_paydex = c(0.620189463001776,  : 
  The following terms are causing separation among the sample points: (Intercept), Average_paydex, Average_amount_of_defaults

So far I have converted the column Default status from factor with levels 0 and 1 to YES and NO, but that does not help. Same data works perfectly with CARET for random forest, but for GLM and e.g. glmStepAIC, I get the same error.

What am I missing?

Really would appreciate help, as I have spent hours on debugging this.

Here is also a link to the dataset in csv. data



Solution 1:[1]

So, managed to solve this. I had safeBinaryRegression loaded, which masks the glm function. So, when using caret package, make sure not to have this loaded at the same time.

Hopefully this solution helps someone :)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1