'python xgboost continue training on existing model

Lets say I build an xgboost model:

bst = xgb.train(param0, dtrain1, num_round, evals=[(dtrain, "training")])

Where:

  • param0 is a set of params to xgb,
  • dtrain1 is a DMatrix ready to be trained
  • num_round is the number of rounds

Then, I save the model to disk:

bst.save_model("xgbmodel")

Later on, I want to reload the model I saved and continue training it with dtrain2

Does anyone have an idea how to do it?



Solution 1:[1]

For users who are looking to continue training with XGBClassifier or object obtained from .fit function of sklearn.

from xgboost import XGBClassifier

# best_est = best number of tree
# best_lr = best learning days
# best_subsample = best subsample bw 0 and 1

params = {'objective': 'binary:logistic', 'use_label_encoder': False, 
          'seed': 27, 'eval_metric': 'logloss', 'n_estimators': best_est, 
          'learning_rate': best_lr, 'subsample': best_subsample}

# train iteration 1  below

model = XGBClassifier(**params)
model.fit(x_train_1, y_train_1)

# train iteration 2 below

model = model.fit(x_train_2, y_train_2, xgb_model=model.get_booster())

In the above code x_train_*, y_train_* are the object of pandas DataFrame type.

The main concept to learn here is, xgb core functions while retraining always takes the booster as input. So one can either provide the booster from model object or the saved model path.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 MSS