'LightGBM error : ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
I am trying to train a LightGBM with gridsearch, I get the below error when I try to train model.
ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
I have provided validation dataset and evaluation metric. Not sure why still I get this problem. Here is my code.
train_data = rtotal[rtotal['train_Y'] == 1]
test_data = rtotal[rtotal['train_Y'] == 0]
trainData, validData = train_test_split(train_data, test_size=0.007, random_state = 123)
#train data prep
X_train = trainData.iloc[:,2:71]
y_train = trainData.loc[:,['a_class']]
#validation data prep
X_valid = validData.iloc[:,2:71]
y_valid = validData.loc[:,['a_class']]
#X_test
X_test = test_data.iloc[:,2:71]
import lightgbm as lgb
from sklearn.model_selection import GridSearchCV
gridParams = {
'learning_rate': [0.005],
'n_estimators': [40],
'num_leaves': [16,32, 64],
'objective' : ['multiclass'],
'random_state' : [501],
'num_boost_round' : [3000],
'colsample_bytree' : [0.65, 0.66],
'subsample' : [0.7,0.75],
'reg_alpha' : [1,1.2],
'reg_lambda' : [1,1.2,1.4],
}
lgb_estimator = lgb.LGBMClassifier(boosting_type = 'gbdt',
n_estimators=500,
objective = 'multiclass',
learning_rate = 0.05, num_leaves = 64,
eval_metric = 'multi_logloss',
verbose_eval=20,
eval_set = [X_valid, y_valid],
early_stopping_rounds=100)
g_lgbm = GridSearchCV(estimator=lgb_estimator, param_grid=gridParams, n_jobs = 3, cv= 3)
lgb_model = g_lgbm.fit(X=X_train, y=y_train)
Solution 1:[1]
From what I see in the code provided, you have couple of problems:
You define your classification as multiclass, it is not exactly that, as you define your output as one column, which I believe may have several labels within that.
If you want early stopping, you need to provide validation set, as the error message clearly states. And you need to do it in a fit method.
If you correct your code for these errors, it will happily run:
gridParams = {
'learning_rate': [0.005],
'n_estimators': [40],
'num_leaves': [16,32, 64],
'random_state' : [501],
'num_boost_round' : [3000],
'colsample_bytree' : [0.65, 0.66],
'subsample' : [0.7,0.75],
'reg_alpha' : [1,1.2],
'reg_lambda' : [1,1.2,1.4],
}
lgb_estimator = lgb.LGBMClassifier(boosting_type = 'gbdt',
n_estimators=500,
learning_rate = 0.05, num_leaves = 64,
eval_metric = 'logloss',
verbose_eval=20,
early_stopping_rounds=10)
g_lgbm = GridSearchCV(estimator=lgb_estimator, param_grid=gridParams, n_jobs = 3, cv= 3)
lgb_model = g_lgbm.fit(X=X_train, y=y_train, eval_set = (X_valid, y_valid))
...
[370] valid_0's binary_logloss: 0.422895
[371] valid_0's binary_logloss: 0.423064
[372] valid_0's binary_logloss: 0.422681
[373] valid_0's binary_logloss: 0.423206
[374] valid_0's binary_logloss: 0.423142
[375] valid_0's binary_logloss: 0.423414
[376] valid_0's binary_logloss: 0.423338
[377] valid_0's binary_logloss: 0.423864
[378] valid_0's binary_logloss: 0.42381
[379] valid_0's binary_logloss: 0.42409
[380] valid_0's binary_logloss: 0.423476
[381] valid_0's binary_logloss: 0.423759
[382] valid_0's binary_logloss: 0.423804
Early stopping, best iteration is:
[372] valid_0's binary_logloss: 0.422681
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |