'Differences between F.relu(X) and torch.max(X, 0)

I am trying to implement the following loss function

enter image description here

To me, the most straight forword implementation would be using torch.max

losses = torch.max(ap_distances - an_distances + margin, torch.Tensor([0]))

However, I saw other implementations on github using F.relu

losses = F.relu(ap_distances - an_distances + margin)

They give essential the same output, but I wonder if there's any fundamental difference between the two methods.



Solution 1:[1]

torch.max is not differentiable according to this discussion. A loss function needs to be continuous and differentiable to do backprop. relu is differentiable as it can be approximated and hence the use of it in a loss function.

Solution 2:[2]

If you are trying to limit the output value like in ReLU6 (https://pytorch.org/docs/stable/generated/torch.nn.ReLU6.html), you can use

import torch.nn.functional as F

x1 = F.hardtanh(x, min_value, max_value)

This preserves the differentiability of the model. This will produce a result like below. (min and max values will be different)

Relu6

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 scarecrow
Solution 2 Jeremy Caney