'XGBoost iterative training: Not having all 0,...,C labels in minibatch without erroring
When training XGBoost iteratively for data too large to fit in memory, one may want to use "batches". The problem is, however, that each batch may not contain all 0,...,C labels. This leads to the error ValueError: The label must consist of integer labels of form 0, 1, 2, ..., [num_class-1] -
Is there a way to train XGBoost where we just have some subset of the labels, which may not contain zero?
The code has structure similar to this:
train = module.trainloader
test = module.valloader
# Train on one minibatch to get started
sample = next(iter(loader))
X = xgb.DMatrix(sample[0].numpy(), label=sample[1].numpy())
params = {
'learning_rate': 0.007,
'updater':'refresh',
'process_type': 'update',
}
# Get initial model training
model = xgb.train(params, dtrain=X)
for i, (trainsample, valsample) in enumerate(zip(train, test)):
X_train, y_train = trainsample
X_test, y_test = valsample
X_train = xgb.DMatrix(X_train, labels=y_train)
X_test = xgb.DMatrix(X_test)
model = xgb.train(params, dtrain=X_train, xgb_model=model)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
