'Segmentation loss function
pred = model(x)['out']
loss_value=loss(pred, target.squeeze(1))
Hi, i am trying to train deeplabv3_resnet50 from pytorch for two classes (background and dog just to try make predictions better). As i understand pred gives us tensor with shape: (batch, num_classes, height, width). Now i need to choose a loss function: for example it will be torch.nn.CrossEntropyLoss. It needs raw input, and only ONE segmentation mask with all classes with their values. So, why it's only one mask for two classes, what if there is more then 'two' classes? I thought it needs two masks, where background is 1 and dog 0, and vice versa. How CrossEntropyLoss works with this? Maybe this will help to explain it to me.
PS. I asked this question because DiceBCELoss instead of one segmentation mask wants two as i understand
Solution 1:[1]
When using nn.CrossEntropyLoss, you are required to provide the labels in dense format, that is: not in one-hot-encoding format, which is what you've shown in your illustration for the target tensor. As a general rule of thumb nn.CrossEntropyLoss expects the target to have one dimension less than the prediction tensor. That is it contains exactly the same sizes for all dimensions but the one that contains the logits value corresponding to each class. In other words, the target tensor contains the label integer value where the prediction tensor has the probability maps. For example for 2D prediction, prediction tensor shape (N, C, H, W), while target tensor shape is expected to be (N, H, W).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ivan |
