'target encoding in the tidymodels framework using embedd

I would like to do target encoding for a categorical variable with too many levels.

I have seen this vignette , which proposes the following approach to target encode a variable:

step_lencode_glm()
step_lencode_bayes() 
step_lencode_mixed()

The three approaches use all the records to create the estimates, which tends to overfit to that column.

Using tidymodels, is there an easy way to split my training set 5 folds and get the target encoding from the other 4 folds?

Thanks



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source