'how to use "scikit-learn calibration" after fine-tuning lightgbm
I fine tuned LGBM and applied calibration, but have troubles applying calibration.
I have 1) train, 2) valid, 3) test data.
I trained and fine-tuned LGBM using 1) train data and 2) valid data. Then, I got a best parameter of LGBM.
After then, I want to calibrate, in order to make my model's output can be directly interpreted as a confidence level. But I'm confused in using scikit-learn CalibratedClassifierCV.
In my situation, should I use cv='prefit' or cv=5? Also, should I use train data or valid data fitting CalibratedClassifierCV?
1) uncalibrated_clf but after training
clf = lgb.LGBMClassifier()
clf.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], verbose=True, early_stopping_rounds=20)
2-1) Calibrated_clf
cal_clf = CalibratedClassifierCV(clf, cv='prefit', method='isotonic')
cal_clf.fit(X_valid, y_valid)
2-2) Calibrated_clf
cal_clf = CalibratedClassifierCV(clf, cv=5, method='isotonic')
cal_clf.fit(X_train, y_train)
2-3) Calibrated_clf
cal_clf = CalibratedClassifierCV(clf, cv=5, method='isotonic')
cal_clf.fit(X_valid, y_valid)
Which one is right? All is right, or only one or two is(are) right?
Below is code.
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.calibration import calibration_curve
from sklearn.calibration import CalibratedClassifierCV
import lightgbm as lgb
import matplotlib.pyplot as plt
np.random.seed(0)
n_samples = 10000
X, y = make_classification(
n_samples=3*n_samples, n_features=20, n_informative=2,
n_classes=2, n_redundant=2, random_state=32)
#n_samples = N_SAMPLES//10
X_train, y_train = X[:n_samples], y[:n_samples]
X_valid, y_valid = X[n_samples:2*n_samples], y[n_samples:2*n_samples]
X_test, y_test = X[2*n_samples:], y[2*n_samples:]
plt.figure(figsize=(12, 9))
plt.plot([0, 1], [0, 1], '--', color='gray')
# 1) Uncalibrated_clf but fine-tuned on training data
clf = lgb.LGBMClassifier()
clf.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], verbose=True, early_stopping_rounds=20)
y_prob = clf.predict_proba(X_test)[:, 1]
fraction_of_positives, mean_predicted_value = calibration_curve(y_test, y_prob, n_bins=10)
plt.plot(
fraction_of_positives,
mean_predicted_value,
'o-', label='uncalibrated_clf')
# 2-1) Calibrated_clf
cal_clf = CalibratedClassifierCV(clf, cv='prefit', method='isotonic')
cal_clf.fit(X_valid, y_valid)
y_prob1 = cal_clf.predict_proba(X_test)[:, 1]
fraction_of_positives1, mean_predicted_value1 = calibration_curve(y_test, y_prob1, n_bins=10)
plt.plot(
fraction_of_positives1,
mean_predicted_value1,
'o-', label='calibrated_clf1')
# 2-2) Calibrated_clf
cal_clf = CalibratedClassifierCV(clf, cv=5, method='isotonic')
cal_clf.fit(X_train, y_train)
y_prob2 = cal_clf.predict_proba(X_test)[:, 1]
fraction_of_positives2, mean_predicted_value2 = calibration_curve(y_test, y_prob2, n_bins=10)
plt.plot(
fraction_of_positives2,
mean_predicted_value2,
'o-', label='calibrated_clf2')
plt.legend()
# 2-3) Calibrated_clf
cal_clf = CalibratedClassifierCV(clf, cv=5, method='isotonic')
cal_clf.fit(X_valid, y_valid)
y_prob3 = cal_clf.predict_proba(X_test)[:, 1]
fraction_of_positives3, mean_predicted_value3 = calibration_curve(y_test, y_prob3, n_bins=10)
plt.plot(
fraction_of_positives2,
mean_predicted_value2,
'o-', label='calibrated_clf3')
plt.legend()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
