'Why is there a difference in MSE when training Xgboost model incrementally on batches v/s training on entire data?

There's a difference in MSE when training Xgboost model incrementally on batches v/s training on entire data.

X_train size:

(8500, 4)

X_val size:

(637, 4)

X_test size:

(200, 4)

My train data (X_train) looks like this (4 columns : % Usage, hour, dayofweek, dayofmonth) :

    % Usage hour    dayofweek   dayofmonth
0   14.265347   22  0   24
1   14.265347   22  0   24
2   13.996887   22  0   24
3   13.775730   22  0   24
4   13.775730   22  0   24

and target (y_train):

0       14.265347
1       13.996887
2       13.775730
3       13.775730
4       14.269257

Xgboost model is being trained incrementally. What I'm doing here is loading the model for the previous batch (if it exists) and continuing training. Once it is done, I'm saving it to be used for the next batch. I'm trying to mimic checkpoint like behaviour:

batch_size = 850

xgb_model = xgb.XGBRegressor(n_estimators=1000)

for start in range(0, len(X_train), batch_size):    
    if f'xgb_model_{start}.model' in os.listdir():
        print(f"Skipping for batch {start}:{start+batch_size}")
        continue
    
    if start == 0:
        xgb_model.fit(
            X_train[start:start+batch_size],
            y_train[start:start+batch_size],
            eval_set=[(X_val, y_val)],
            early_stopping_rounds=50,
            verbose=False
        )
        xgb_model.save_model(f'xgb_model_{start}.model')
    else:
        xgb_model.fit(
            X_train[start:start+batch_size],
            y_train[start:start+batch_size],
            eval_set=[(X_val, y_val)],
            early_stopping_rounds=50,
            verbose=False,
            xgb_model = f'xgb_model_{start-batch_size}.model'
        )
        xgb_model.save_model(f'xgb_model_{start}.model')

    y_pred = xgb_model.predict(X_test)
    print(f"MSE : {mean_squared_error(y_pred, y_test)}")

Output:

MSE : 0.8678093773264584
MSE : 2.046533869862948
MSE : 1.1568086137077669
MSE : 2.291347951272582
MSE : 1.5389712184418989
MSE : 1.4457848862752014
MSE : 1.7740441472551185
MSE : 4.179429599396931
MSE : 6.211388954159769
MSE : 4.753687392359755

Not only is it higher, the MSE is increasing as the model proceeds. But, if I train the model with same parameters on the entire dataset, MSE is much lower than this

xgb_model = xgb.XGBRegressor(n_estimators=1000)

xgb_model.fit(
            X_train,
            y_train,
            eval_set=[(X_val, y_val)],
            early_stopping_rounds=50,
            verbose=False
        )

y_pred = xgb_model.predict(X_test)
print(f"MSE : {mean_squared_error(y_pred, y_test)}")

Output:

MSE : 0.4356189240236812

There's a noticeable difference in the Mean Squared Error: 0.435 vs 4.753 Why is this is so? Shouldn't it be almost similar if not equal?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source