'Follow Up Question About Whether Preprocessing Test Set Is Needed

Please refer to the previous question here (https://stackoverflow.com/a/71389007/17537724)

With the pipeline below, will imputation, scaling and dummying variables be performed automatically on test set when predicting?

rsf = as_learner(po("imputemedian")  %>>%
                 po("imputemode")    %>>% 
                 po("scale")         %>>%
                 po("encode")        %>>% 
                 lrn("surv.rfsrc")

Another question, if I create a learner with specific hyperparameters for example based on a published model and I want to use it for prediction only without training. What would happen if I use two different data sets? Do I need to de-select non-influential variables from the data set? I assume so since all variables will be used in this case since model is not trained

rsf = as_learner(po("imputemedian")  %>>%
                 po("imputemode")    %>>% 
                 po("scale")         %>>%
                 po("encode")        %>>% 
                 lrn("surv.rfsrc",
                     ntree    = 1200,
                     mtry     = 2,
                     nodesize = 10,
                     nsplit   = 1)

mlr3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Follow Up Question About Whether Preprocessing Test Set Is Needed

Sources

Related Questions