Category "cross-validation"

Catboost overfits training data but test performance increases

I'm training catboost on a dataset made of 41k observations and ~60 features. The dataset is a longitudinal series (9 years) that is spatially distributed. At t

Difference between GroupSplitShuffle and GroupKFolds

As the title says, I want to know the difference between sklearn's GroupKFold and GroupShuffleSplit. Both make train-test splits given for data that has a group

data partitionning function CreateDataPartition cross validation problem

I am trying to get predictions of a multiple variables model, its eplt, its made of 7 scores and one final exam score moy_exam2, I want to predict the later usi

Does sklearn LogisticRegressionCV use all data for final model

I was wondering how the final model (i.e. decision boundary) of LogisticRegressionCV in sklearn was calculated. So say I have some Xdata and ylabels such that

RandomForestClassifier instance not fitted yet. Call 'fit' with appropriate arguments before using this method

I am trying to train a decision tree model, save it, and then reload it when I need it later. However, I keep getting the following error: This DecisionTre

Does the caret package for R properly implement repeated CV when passed a multifold object to trainControl's index option?

I'm hoping the answer to this question is a quick "yes" or "no" but I cannot find it explicitly in the caret documentation or elsewhere online. I want to perfor

Optuna catboost pruning

is there a way to have pruning with CatBoost and Optuna (in LightGBM it's easy but in Catboost I can't find any hint). My code is like this def objective(trial)

How i can extracte x_train and y_train from train_generator?

In my CNN model I want to extract X_train and y_train from train_generator. I want to use ensemble learning, bagging and boosting to evaluate the model. the mai

pandas create Cross-Validation based on specific columns

I have a dataframe of few hundreds rows , that can be grouped to ids as follows: df = Val1 Val2 Val3 Id 2 2 8 b 1 2 3 a 5

GridSearchCV on LogisticRegression in scikit-learn

I am trying to optimize a logistic regression function in scikit-learn by using a cross-validated grid parameter search, but I can't seem to implement it. It