'Multiclass ROC Curve in Python

I was trying to plot a Multiclass ROC Curve in a similar way to the one in documentation. My first code was like the one below.

y = data["target"]
n_classes = y.shape[1]

Y = label_binarize(target, classes=[0,1,2])
X_train, X_test, y_train, y_test=train_test_split(predictors, Y, test_size = 0.2, random_state = 42,stratify = Y)


clf = OneVsRestClassifier(LogisticRegression())

clf.fit(X_train, y_train)

y_score = clf.predict(X_test)

Then I though what would happen if I train the algorithm first and then binarize the output.

y = data["target"]
n_classes = y.shape[1]


X_train, X_test, y_train, y_test=train_test_split(predictors, y, test_size = 0.2, random_state = 42,stratify = y)


clf = OneVsRestClassifier(LogisticRegression())

clf.fit(X_train, y_train)

y_score  = label_binarize(clf.predict(X_test), classes=[0,1,2])

In the second code I still need a binarized train_test_split since I need to use binarized version of y_test for plotting the curve.When I used label_binarize() after training the model, performance metrics got improved. I think maybe using non-binarized version of y_train works better but I'm not sure doing this is okay. Can we use this binarization step after training the model just like I did in the second code ? Since we are using this binarization step just for being able to plot ROC Curve I think using binarization after training the model wouldn't cause a problem, however, I don't see myself experienced enough to decide that. I wonder what others think about that.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source