'Variable Importance Tidymodels versus Caret with Interactions
Why are the variable importance plots different between tidymodels and caret when including interaction terms? I have demonstrated with the Ames housing data below. I used the same alpha/mixture and lambda/penalty in both models. The only difference between the models it the cross validation folds (I cannot figure out how to use tidymodel's folds with caret's train). Any ideas on why this is happening?
library(AmesHousing)
library(tidymodels)
library(caret)
library(vip)
df <- data.frame(ames_raw)
head(df)
# replace any missing observation with the mean
for(i in 1:ncol(df)){
df[is.na(df[,i]), i] <- mean(df[,i], na.rm = TRUE)
}
# Create a data split object
set.seed(1994)
home_split <- initial_split(df,
prop = 0.7,
strata = SalePrice)
home_train <- home_split %>%
training()
home_test <- home_split %>%
testing()
# pre-process recipe
recipe_home <- recipe(SalePrice ~ Yr.Sold + Fireplaces + Full.Bath + Half.Bath + Year.Built + Lot.Area,
data = home_train) %>%
step_interact(terms = ~ Yr.Sold:Fireplaces:Full.Bath:Half.Bath:Year.Built:Lot.Area)
# model with hyperparameters
glmnet_model <- linear_reg(penalty = tune(), # lambda
mixture = tune()) %>% # alpha
set_engine('glmnet') %>%
set_mode('regression')
# model + recipe = workflow
wkfl <- workflow() %>%
add_model(glmnet_model) %>%
add_recipe(recipe_home)
# cv
set.seed(1994)
myfolds <- vfold_cv(home_train,
v = 10,
strata = SalePrice)
# grid search with cv
set.seed(1994)
glmnet_tuning <- wkfl %>%
tune_grid(resamples = myfolds,
grid = 25, # let the model find the best hyperparameters
metrics = metric_set(rmse))
glmnet_tuning
# select the best model
best_glmnet_model <- glmnet_tuning %>%
select_best(metric = 'rmse')
best_glmnet_model
# finalize the workflow
final_glmnet_wkfl <- wkfl %>%
finalize_workflow(best_glmnet_model)
# last_fit:
glmnet_final_fit <- final_glmnet_wkfl %>%
last_fit(split = home_split)
# extract the final model
final_glmnet <- extract_workflow(glmnet_final_fit)
# VIP final model
final_glmnet %>%
extract_fit_parsnip() %>%
vip(geom = "point", scale = TRUE)
set.seed(1994)
myGrid <- expand.grid(lambda = 0.00386,
alpha = 0.0874)
model_glmnet <- train(SalePrice ~ (Yr.Sold + Fireplaces + Full.Bath + Half.Bath + Year.Built
+ Lot.Area)^2,
data=home_train,
method = "glmnet",
tune_grid = myGrid,
metric = "RMSE",
maximize = FALSE,
trControl = trainControl(
method = "cv",
number = 10))
# variable importance
vip(model_glmnet, geom = "point", scale = TRUE)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|


