'Multiclass ROC Curve in Python
I was trying to plot a Multiclass ROC Curve in a similar way to the one in documentation. My first code was like the one below.
y = data["target"]
n_classes = y.shape[1]
Y = label_binarize(target, classes=[0,1,2])
X_train, X_test, y_train, y_test=train_test_split(predictors, Y, test_size = 0.2, random_state = 42,stratify = Y)
clf = OneVsRestClassifier(LogisticRegression())
clf.fit(X_train, y_train)
y_score = clf.predict(X_test)
Then I though what would happen if I train the algorithm first and then binarize the output.
y = data["target"]
n_classes = y.shape[1]
X_train, X_test, y_train, y_test=train_test_split(predictors, y, test_size = 0.2, random_state = 42,stratify = y)
clf = OneVsRestClassifier(LogisticRegression())
clf.fit(X_train, y_train)
y_score = label_binarize(clf.predict(X_test), classes=[0,1,2])
In the second code I still need a binarized train_test_split since I need to use binarized version of y_test for plotting the curve.When I used label_binarize() after training the model, performance metrics got improved. I think maybe using non-binarized version of y_train works better but I'm not sure doing this is okay. Can we use this binarization step after training the model just like I did in the second code ? Since we are using this binarization step just for being able to plot ROC Curve I think using binarization after training the model wouldn't cause a problem, however, I don't see myself experienced enough to decide that. I wonder what others think about that.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
