Category "categorical-data"

Dummy variables: must I scale* them if I scale all dataset or leave them apart? *(center, scale, normalize...)

When working with scaled data, dummy variables should also be scaled or should be left apart without scaling? Can ML algorithms produce different result and whi

SMOTE_NC function in R: error in the ouput

thank you in advance for your time! I'm having some trouble with the SMOTE_NC function in R (https://rdrr.io/github/dongyuanwu/RSBID/man/SMOTE_NC.html). Shortly

How to get categorical values in catboost

This is my data. I created a model with CatBoostClassifier(). I can get the feature names list with: >>> model.feature_names_ ['title', 'value'] Firs

How to get the relation between categorical and numerical variables of a dataframe?

I have a dataframe with 49 columns. Most of them are categorical (dtype object), some are numerical. As I'm a newbie in data science I tried to plot the Pearson

Error: cannot allocate vector of size X Gb Rstudio

Never had this problem before but now it's constantly there for any piece of code I write. > sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-w64-

Feature-Engine RareLabelEncoder: ValueError: could not convert string to float: 'Rare'

from sklearn.compose import make_column_transformer from sklearn.preprocessing import StandardScaler from feature_engine.encoding import RareLabelEncoder from f

Getting ValueError: y contains new labels when using scikit learn's LabelEncoder

I have a series like: df['ID'] = ['ABC123', 'IDF345', ...] I'm using scikit's LabelEncoder to convert it to numerical values to be fed into the RandomForestC