'Differences between F.relu(X) and torch.max(X, 0)
I am trying to implement the following loss function
To me, the most straight forword implementation would be using torch.max
losses = torch.max(ap_distances - an_distances + margin, torch.Tensor([0]))
However, I saw other implementations on github using F.relu
losses = F.relu(ap_distances - an_distances + margin)
They give essential the same output, but I wonder if there's any fundamental difference between the two methods.
Solution 1:[1]
torch.max is not differentiable according to this discussion.
A loss function needs to be continuous and differentiable to do backprop. relu is differentiable as it can be approximated and hence the use of it in a loss function.
Solution 2:[2]
If you are trying to limit the output value like in ReLU6 (https://pytorch.org/docs/stable/generated/torch.nn.ReLU6.html), you can use
import torch.nn.functional as F
x1 = F.hardtanh(x, min_value, max_value)
This preserves the differentiability of the model. This will produce a result like below. (min and max values will be different)

Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | scarecrow |
| Solution 2 | Jeremy Caney |

