'Get the area under a ROC curve in python pyod?
I have data for 5,000 observations. I split the dataset in two: the variables (X_train) and the labeled target (y_train). I am using pyod because it seems to be the most popular Python library for anomaly detection.
I fit the model to the data with the following code:
from pyod.models.knn import KNN
from pyod.utils import evaluate_print
clf = KNN(n_neighbors=10, method='mean', metric='euclidean')
clf.fit(X_train)
scores = clf.decision_scores_
The model is now fitted and I have the probability of an observation being an outlier stored in scores. I manually calculated the area under the ROC curve and it returned 0.69.
I noticed this is the same result when using:
evaluate_print('KNN with k=10', y=y_train, y_pred=scores)
Which returns: KNN with k=10 ROC:0.69, precision @ rank n:0.1618.
I want to know if there is a specific function in pyod which would return only the 0.69.
Solution 1:[1]
The pyod package itself computes ROC from sklearn.metrics.roc_auc_score. You can see that in Benchmark.ipynb in notebooks folder of the pyod repository. So to get only the ROC please use this:
from sklearn.metrics import roc_auc_score
roc = round(roc_auc_score(y_test, test_scores))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Arvind Kumar |
