'sklearn classification report

I am training an electra model with tensorflow on a multi label task. The ROC performance of each individual label is

AUROC per tag
morality_binary: 0.8840802907943726
emotion_binary: 0.8690611124038696
positive_binary: 0.9115268588066101
negative_binary: 0.9200447201728821
care_binary: 0.9266915321350098
fairness_binary: 0.8638730645179749
authority_binary: 0.8471786379814148
sanctity_binary: 0.9040042757987976
harm_binary: 0.9046630859375
injustice_binary: 0.8968375325202942
betrayal_binary: 0.846387505531311
subversion_binary: 0.7741811871528625
degradation_binary: 0.9601025581359863

But when I run the the sklearn classification report:

THRESHOLD = 0.5

y_pred = predictions.numpy()
y_true = labels.numpy()

upper, lower = 1, 0

y_pred = np.where(y_pred > THRESHOLD, upper, lower)

print(classification_report(
  y_true, 
  y_pred, 
  target_names=LABEL_COLUMNS, 
  zero_division=0
))

... five of the labels turns out with an f-score of 0:

                    precision    recall  f1-score   support

   morality_binary       0.72      0.73      0.73       347
    emotion_binary       0.66      0.73      0.69       303
   positive_binary       0.71      0.76      0.73       242
   negative_binary       0.70      0.62      0.65       141
       care_binary       0.67      0.60      0.63       141
   fairness_binary       0.55      0.53      0.54       166
  authority_binary       0.00      0.00      0.00        49
   sanctity_binary       0.00      0.00      0.00        23
       harm_binary       0.48      0.32      0.39        50
  injustice_binary       0.62      0.56      0.59        97
   betrayal_binary       0.00      0.00      0.00        30
 subversion_binary       0.00      0.00      0.00         8
degradation_binary       0.00      0.00      0.00        10

Can someone explain to me how this is possible? I can understand a low f-score, but 0?



Solution 1:[1]

I assume 0 is negative and 1 is positive.

AUROC calculates the area under the ROC curve as a measure of how well a classifier performs (0.5 score is a random, coin-flip model). To draw the ROC curve, you need to calculate two values at different threshold values to distinguish positive from negative examples.

  • y-axis: True positive rate (TPR) - How many of the positive examples did the model predict as negative.
  • x-axis: False positive rate (FPR) - How many of the negative examples did the model predict as positive.

TPR is also called recall. We calculate this using the following formula:

TPR = True positives / (True positives + False Negatives) = True positives / All positives

So the only way TPR can be 0 is because TP is also 0. This means that precision will also be 0 as we calculate precision using the following formula:

Precision = True positives / (True positives + False positives)

Which will also result in 0 if and only if TP is equal to 0.

Now given the ROC curve (Roc curve shown here), if FPR is 0, the area under the curve will also be equal to 0. This is because you have picked a single threshold value (0.5) in your code to predict 0 or 1. This is not a representation of the ROC curve and AUROC measure.

I suggest you take a look at the ROC curve and try different values for you classification threshold. The resulting AUROC values suggest that your model performs better than a random one in general, so you should find a good threshold.


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ethan Van den Bleeken