'Follow Up Question About Whether Preprocessing Test Set Is Needed
Please refer to the previous question here (https://stackoverflow.com/a/71389007/17537724)
With the pipeline below, will imputation, scaling and dummying variables be performed automatically on test set when predicting?
rsf = as_learner(po("imputemedian") %>>%
po("imputemode") %>>%
po("scale") %>>%
po("encode") %>>%
lrn("surv.rfsrc")
Another question, if I create a learner with specific hyperparameters for example based on a published model and I want to use it for prediction only without training. What would happen if I use two different data sets? Do I need to de-select non-influential variables from the data set? I assume so since all variables will be used in this case since model is not trained
rsf = as_learner(po("imputemedian") %>>%
po("imputemode") %>>%
po("scale") %>>%
po("encode") %>>%
lrn("surv.rfsrc",
ntree = 1200,
mtry = 2,
nodesize = 10,
nsplit = 1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
