'Strange\unexpected behavior with class_weight and LightGBM
I have a LGBM model which does not use the 'class_weight' parameter.
When I run the model I currently achieve a score of 0.81659.
When I apply: class_weight='balanced' the score drops substantially to 0.78134. A loss of 0.03525.
When I manually compute the weights and apply it with: class_weight={0: 0.61378, 2: 0.86751, 1: 4.58652} the score drops to 0.78129.
I attribute the minor difference between my manual calculation and 'balanced' to be rounding error, since I arbitrarily truncated the weights to 5 decimal places.
There are three labels distributed as follows: Counter({0: 32259, 2: 22824, 1: 4317})
Let's pretend the labels were distributed as 33%, 33%, 34%. I would expect applying class weights or not applying them to have virtually the same impact. There is no reason to expect using class weights in this case to have much, if any, impact.
But the actual data is quite imbalanced.
I would expect setting up the model to have knowledge of this imbalance would allow the model to make a more informed, i.e., better prediction, even if 'better' is only a slight improvement.
I certainly would not expect a 3.5% drop in model performance.
Am I not applying the weights correctly?
Main code block:
model = lgb.LGBMClassifier(learning_rate=i,
num_leaves=j,
objective='multiclass',
colsample_bytree=c,
max_bin=512,
n_estimators=n,
class_weight={0: 0.61378, 2: 0.86751, 1: 4.58652},
random_state=13,
n_jobs=-1,
)
# define evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=13)
# evaluate model
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1)
# summarize performance
# print('Mean accuracy: %.3f' % np.mean(scores))
print(f'For {i} learning_rate, {j} num_leaves, {c} colsample_bytree, {n} n_estimators:\n'
f' The Mean accuracy is: {np.mean(scores):.5f}, The Standard Deviation is: {np.std(scores):.3f}')
mean_accuracy.append(np.mean(scores))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
