'Multiclass semantic segmentation model evaluation

I am doing a project on multiclass semantic segmentation. I have formulated a model that outputs pretty descent segmented images by decreasing the loss value. However, I cannot evaluate the model performance in metrics, such as meanIoU or Dice coefficient. In case of binary semantic segmentation it was easy just to set the threshold of 0.5, to classify the outputs as an object or background, but it does not work in the case of multiclass semantic segmentation. Could you please tell me how to obtain model performance on the aforementioned metrics? Any help will be highly appreciated!

By the way, I am using PyTorch framework and CamVid dataset.



Solution 1:[1]

If anyone is interested in this answer, please also look at this issue. The author of the issue points out that mIoU can be computed in a different way (and that method is more accepted in literature). So, consider that before using the implementation for any formal publication.

Basically, the other method suggested by the issue-poster is to separately accumulate the intersections and unions over the entire dataset and divide them at the final step. The method in the below original answer computes intersection and union for a batch of images, then divides them to get IoU for the current batch, and then takes a mean of the IoUs over the entire dataset.

However, this below given original method is problematic because the final mean IoU would vary with the batch-size. On the other hand, the mIoU would not vary with the batch size for the method mentioned in the issue as the separate accumulation would ensure that batch size is irrelevant (though higher batch size can definitely help speed up the evaluation).

Original answer:

Given below is an implementation of mean IoU (Intersection over Union) in PyTorch.

def mIOU(label, pred, num_classes=19):
    pred = F.softmax(pred, dim=1)              
    pred = torch.argmax(pred, dim=1).squeeze(1)
    iou_list = list()
    present_iou_list = list()

    pred = pred.view(-1)
    label = label.view(-1)
    # Note: Following for loop goes from 0 to (num_classes-1)
    # and ignore_index is num_classes, thus ignore_index is
    # not considered in computation of IoU.
    for sem_class in range(num_classes):
        pred_inds = (pred == sem_class)
        target_inds = (label == sem_class)
        if target_inds.long().sum().item() == 0:
            iou_now = float('nan')
        else: 
            intersection_now = (pred_inds[target_inds]).long().sum().item()
            union_now = pred_inds.long().sum().item() + target_inds.long().sum().item() - intersection_now
            iou_now = float(intersection_now) / float(union_now)
            present_iou_list.append(iou_now)
        iou_list.append(iou_now)
    return np.mean(present_iou_list)

Prediction of your model will be in one-hot form, so first take softmax (if your model doesn't already) followed by argmax to get the index with the highest probability at each pixel. Then, we calculate IoU for each class (and take the mean over it at the end).

We can reshape both the prediction and the label as 1-D vectors (I read that it makes the computation faster). For each class, we first identify the indices of that class using pred_inds = (pred == sem_class) and target_inds = (label == sem_class). The resulting pred_inds and target_inds will have 1 at pixels labelled as that particular class while 0 for any other class.

Then, there is a possibility that the target does not contain that particular class at all. This will make that class's IoU calculation invalid as it is not present in the target. So, you assign such classes a NaN IoU (so you can identify them later) and not involve them in the calculation of the mean.

If the particular class is present in the target, then pred_inds[target_inds] will give a vector of 1s and 0s where indices with 1 are those where prediction and target are equal and zero otherwise. Taking the sum of all elements of this will give us the intersection.

If we add all the elements of pred_inds and target_inds, we'll get the union + intersection of pixels of that particular class. So, we subtract the already calculated intersection to get the union. Then, we can divide the intersection and union to get the IoU of that particular class and add it to a list of valid IoUs.

At the end, you take the mean of the entire list to get the mIoU. If you want the Dice Coefficient, you can calculate it in a similar fashion.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1