'Get the area under a ROC curve in python pyod?

I have data for 5,000 observations. I split the dataset in two: the variables (X_train) and the labeled target (y_train). I am using pyod because it seems to be the most popular Python library for anomaly detection.

I fit the model to the data with the following code:

from pyod.models.knn import KNN
from pyod.utils import evaluate_print

clf = KNN(n_neighbors=10, method='mean', metric='euclidean')
clf.fit(X_train)
scores = clf.decision_scores_

The model is now fitted and I have the probability of an observation being an outlier stored in scores. I manually calculated the area under the ROC curve and it returned 0.69.

I noticed this is the same result when using:

evaluate_print('KNN with k=10', y=y_train, y_pred=scores)

Which returns: KNN with k=10 ROC:0.69, precision @ rank n:0.1618.

I want to know if there is a specific function in pyod which would return only the 0.69.



Solution 1:[1]

The pyod package itself computes ROC from sklearn.metrics.roc_auc_score. You can see that in Benchmark.ipynb in notebooks folder of the pyod repository. So to get only the ROC please use this: from sklearn.metrics import roc_auc_score

roc = round(roc_auc_score(y_test, test_scores))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Arvind Kumar