'Why am i getting different roc_auc_scores?
I am working on a binary classification problem and using the following code to find ROC_AUC score(at each fold) using 5-fold cross-validation.
cv = KFold(n_splits=5,shuffle=True,random_state=41)
classifier=RandomForestClassifier()
tprs = []
aucs = []
mean_fpr = np.linspace(0, 1, 100)
for i,(train, test) in enumerate (cv.split(X, y)):
clf=classifier.fit(X[train],y[train])
prediction=clf.predict_proba(X[test])[:,1]
fpr,tpr,t=roc_curve(y[test],prediction)
tprs.append(np.interp(mean_fpr,fpr,tpr))
roc_auc=auc(fpr,tpr)
aucs.append(roc_auc)
plt.plot(fpr,tpr,lw=2,alpha=1,label='Fold %d (AUC=%0.4f)'%(i+1,roc_auc))
plt.plot([0,1], [0,1], linestyle='--', lw=2, color='k',
label='Chance', alpha=1)
mean_tpr = np.mean(tprs, axis=0)
mean_auc = auc(mean_fpr, mean_tpr)
std_auc = np.std(aucs)
plt.plot(mean_fpr, mean_tpr, color='b',
label=r'Mean ROC (AUC = %0.4f $\pm$ %0.4f)' % (mean_auc, std_auc),
lw=2, alpha=1)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC')
plt.legend(loc="lower right")
plt.show()
In addition to ROC_AUC scores, I also want to calculate other performance metrics like accuracy, precision, recall at each fold. So for that I used the following code, but I am getting different scores for ROC_AUC here compared to what I obtained above.
from sklearn.metrics import make_scorer,accuracy_score,precision_score,recall_score,roc_auc_score
scoring={'accuracy': make_scorer(accuracy_score),
'recall': make_scorer(recall_score),
'precision': make_scorer(precision_score),
'roc_auc_score':make_scorer(roc_auc_score)}
results=model_selection.cross_validate(estimator=classifier,X=X,y=y,cv=cv,scoring=scoring)
Can anyone point out where I am going wrong? Also how to calculate other metrics at each fold?
Solution 1:[1]
Have a look at the make_scorer function:
metrics.make_scorer(score_func, *, greater_is_better=True, needs_proba=False, needs_threshold=False, **kwargs)
By default, the argument 'needs_proba' is set to False. However, in case of roc_auc_score it should be set to True. If it is set to False, the ROC_AUC score will be calculated by using only one confidence threshold instead of multiples and therefore the two results differ.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Philipp Steiner |
