'Linear Regression with Caret
I could really use your help. I am trying to write an R script that takes some data and performs glm using the caret package. Here is my code:
set.seed(4000)
# Create training and test data with 80%-20% ratio
new_values$gender <- as.factor(new_values$gender)
trainingRows= createDataPartition(new_values$gender, p= .8, list= FALSE, times= 1)
training_data_set= new_values[trainingRows,]
test_data_set= new_values[-trainingRows,]
# Test training with 10 times cross-validation
fitness_control <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
# Train model with linear regression method (it takes about 5-10 minutes waiting time)
linear_regression <-train(gender~ ., data=training_data_set,method="glm",family=binomial(), trControl=fitness_control)
linear_regression
Here is the data table: new_data table
When I try to run this script R takes really long time to load and after that I get this error message:
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)
The warning messages are:
Warning messages: 1: model fit failed for Fold01: parameter=none Error : protect(): protection stack overflow
2: model fit failed for Fold02: parameter=none Error : protect(): protection stack overflow
3: model fit failed for Fold03: parameter=none Error : protect(): protection stack overflow
4: model fit failed for Fold04: parameter=none Error : protect(): protection stack overflow
5: model fit failed for Fold05: parameter=none Error : protect(): protection stack overflow
6: model fit failed for Fold06: parameter=none Error : protect(): protection stack overflow
7: model fit failed for Fold07: parameter=none Error : protect(): protection stack overflow
8: model fit failed for Fold08: parameter=none Error : protect(): protection stack overflow
9: model fit failed for Fold09: parameter=none Error : protect(): protection stack overflow
10: model fit failed for Fold10: parameter=none Error : protect(): protection stack overflow
11: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, ... : There were missing values in resampled performance measures.
Can you please help?
Solution 1:[1]
Fitting with glmnet seems to work OK, although I haven't looked to see if the answers actually make sense! I had to sort out some data issues, which might have been what was getting in your way ...
library(readxl)
library(caret)
library(glmnet)
library(dplyr)
dd <- (read_excel("thema3_results1.xlsx")
|> select(-1) ## drop row names
|> mutate(across(gender, factor))
|> mutate(across(-gender, as.numeric)) ## convert character to numeric!
)
set.seed(4000)
trainingRows <- createDataPartition(dd$gender, p= .8, list= FALSE, times= 1)
training_data_set <- dd[trainingRows,]
test_data_set <- dd[-trainingRows,]
# Test training with 10 times cross-validation
fitness_control <- trainControl(method = "cv", number = 10, savePredictions = TRUE)
system.time(logistic_reg <- train(gender~ .,
data=training_data_set,
method="glmnet",
family="binomial", ## not binomial() for glmnet ...
trControl=fitness_control))
The training step took about 2 seconds on my machine,
This seems to be getting accuracy == 1, which probably means it's still overfitting ... ???
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ben Bolker |
