'torchmetrics represent uncertainty

I am using torchmetrics to calculate metrics such as F1 score, Recall, Precision and Accuracy in multilabel classification setting. With random initiliazed weights the softmax output (i.e. prediction) might look like this with a batch size of 8:

import torch
y_pred = torch.tensor([[0.1944, 0.1931, 0.2184, 0.1968, 0.1973],
                       [0.2182, 0.1932, 0.1945, 0.1973, 0.1968],
                       [0.2182, 0.1932, 0.1944, 0.1973, 0.1969],
                       [0.2182, 0.1931, 0.1945, 0.1973, 0.1968],
                       [0.2184, 0.1931, 0.1944, 0.1973, 0.1968],
                       [0.2181, 0.1932, 0.1941, 0.1970, 0.1976],
                       [0.2183, 0.1932, 0.1944, 0.1974, 0.1967],
                       [0.2182, 0.1931, 0.1945, 0.1973, 0.1968]])

With the correct labels (one-hot encoded):

y_true = torch.tensor([[0, 0, 1, 0, 1],
                       [0, 1, 0, 0, 1],
                       [0, 1, 0, 0, 1],
                       [0, 0, 1, 1, 0],
                       [0, 0, 1, 1, 0],
                       [0, 1, 0, 1, 0],
                       [0, 1, 0, 1, 0],
                       [0, 0, 1, 0, 1]])

And I can calculate the metrics by taking argmax:

import torchmetrics
torchmetrics.functional.f1_score(y_pred.argmax(-1), y_true.argmax(-1))

output:

tensor(0.1250)

The first prediction happens to be correct while the rest are wrong. However, none of the predictive probabilities are above 0.3, which means that the model is generally uncertain about the predictions. I would like to encode this and say that the f1 score should be 0.0 because none of the predictive probabilities are above a 0.3 threshold.


Is this possible with torchmetrics or sklearn library?

Is this common practice?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source