'XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring

When training XGBoost iteratively for data too large to fit in memory, one may want to use "batches". The problem is, however, that each batch may not contain all 0,...,C labels. This leads to the error ValueError: The label must consist of integer labels of form 0, 1, 2, ..., [num_class-1] -

Is there a way to train XGBoost where we just have some subset of the labels, which may not contain zero?

The code has structure similar to this:

train = module.trainloader
test = module.valloader

# Train on one minibatch to get started 
sample = next(iter(loader))
X = xgb.DMatrix(sample[0].numpy(), label=sample[1].numpy())

params = {
    'learning_rate': 0.007,
    'updater':'refresh',
    'process_type': 'update',
}

# Get initial model training 
model = xgb.train(params, dtrain=X)

for i, (trainsample, valsample) in enumerate(zip(train, test)):
    X_train, y_train = trainsample
    X_test, y_test = valsample
    
    X_train = xgb.DMatrix(X_train, labels=y_train)
    
    X_test = xgb.DMatrix(X_test)

    model = xgb.train(params, dtrain=X_train, xgb_model=model)

    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    print(accuracy)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring

Sources

Related Questions