'How to prep a recipe, including tunable arguments?

As you can see from my code, I am trying to include feature selection into my tidymodels workflow. I am using some kaggle data, trying to predict customer churn.

In order to apply processing to test and training data, I am baking the recipe after I am using the the prep() function.

However, if I want to apply tuning for the step_select_roc() functions top_p argument, I do not know, how to prep() the recipe afterwards. Applying it as in my reprex, results in an error.

Maybe I have to adapt my workflow and separate some recipe tasks to get the job done. What is the best approach to achieve this?

#### LIBS

suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(themis))
suppressPackageStartupMessages(library(recipeselectors))


#### INPUT

# get dataset from: https://www.kaggle.com/shrutimechlearn/churn-modelling
data <- fread("Churn_Modelling.csv")


# split data
set.seed(seed = 1972) 
train_test_split <-
  rsample::initial_split(
    data = data,     
    prop = 0.80   
  ) 
train_tbl <- train_test_split %>% training() 
test_tbl  <- train_test_split %>% testing() 


#### FEATURE ENGINEERING

# Define the recipe
recipe <- recipe(Exited ~ ., data = train_tbl) %>%
  step_rm(one_of("RowNumber", "Surname")) %>%
  update_role(CustomerId, new_role = "Helper") %>%
  step_num2factor(all_outcomes(),
                  levels = c("No", "Yes"),
                  transform = function(x) {x + 1}) %>%
  step_normalize(all_numeric(), -has_role(match = "Helper")) %>%
  step_dummy(all_nominal(), -all_outcomes()) %>%
  step_corr(all_numeric(), -has_role("Helper")) %>%
  step_nzv(all_predictors()) %>%
  step_select_roc(all_predictors(), outcome = "Exited", top_p = tune()) %>%  
  prep()


# Bake it
train_baked <- recipe %>%  bake(train_tbl)
test_baked <- recipe %>% bake(test_tbl) 


Solution 1:[1]

Thanks to the help of Steven Pawley, I was able to integrate the tunable step_roc argument into my tidymodels model workflow. As Julia Silge mentioned, it is not possible to prep a recipe with tunable arguments. So if you still want to prep and bake your recipe, you can only do this as in the following example, after you have finalized your model and recipe:

suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(doParallel))
suppressPackageStartupMessages(library(recipeselectors))
suppressPackageStartupMessages(library(finetune))

data(cells, package = "modeldata")

cells <- cells %>% select(-case)
set.seed(31)
split <- initial_split(cells, prop = 0.8)
train <- training(split)
test <- testing(split)

rec <-
    recipe(class ~ ., data = train) %>%
    step_corr(all_predictors(), threshold = 0.9) %>% 
    step_select_roc(all_predictors(), outcome = "class", top_p = tune())

# xgboost model
xgb_spec <- boost_tree(
    trees = tune(), 
    tree_depth = tune(), min_n = tune(), 
    loss_reduction = tune(),                    
    sample_size = tune(), mtry = tune(),         
    learn_rate = tune(),                        
    stop_iter = tune()
) %>% 
    set_engine("xgboost") %>% 
    set_mode("classification")

# grid
xgb_grid <- grid_latin_hypercube(
    trees(),
    tree_depth(),
    min_n(),
    loss_reduction(),
    sample_size = sample_prop(),
    finalize(mtry(), train),
    learn_rate(),
    stop_iter(range = c(5L,50L)),
    size = 5
)

rec_grid <- grid_latin_hypercube(
    parameters(rec) %>% 
        update(top_p = top_p(c(0,30))) ,
    size = 5
)

comp_grid <- merge(xgb_grid, rec_grid)

model_metrics <- metric_set(roc_auc)  


rs <- vfold_cv(cells)

ctrl <- control_grid(pkgs = "recipeselectors")

cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(cores)
registerDoParallel(cl)
set.seed(234)
rfe_res <-
    xgb_spec %>% 
    tune_grid(
        preprocessor = rec,
        resamples = rs,
        grid = comp_grid,
        control = ctrl
    )
stopCluster(cl)


best <- rfe_res %>% select_best("roc_auc")

# finalize
final_mod <- finalize_model(xgb_spec, best)
final_rec <- finalize_recipe(rec, best)

# bakery
bake_test <- final_rec %>% prep() %>% bake(new_data = testing(split))
bake_train <- final_rec %>% prep() %>% bake(new_data = training(split))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Leonhard Geisler