'Differences between F.relu(X) and torch.max(X, 0)

I am trying to implement the following loss function

To me, the most straight forword implementation would be using torch.max

losses = torch.max(ap_distances - an_distances + margin, torch.Tensor([0]))

However, I saw other implementations on github using F.relu

losses = F.relu(ap_distances - an_distances + margin)

They give essential the same output, but I wonder if there's any fundamental difference between the two methods.

Solution 1:^[1]

torch.max is not differentiable according to this discussion. A loss function needs to be continuous and differentiable to do backprop. relu is differentiable as it can be approximated and hence the use of it in a loss function.

Solution 2:^[2]

If you are trying to limit the output value like in ReLU6 (https://pytorch.org/docs/stable/generated/torch.nn.ReLU6.html), you can use

import torch.nn.functional as F

x1 = F.hardtanh(x, min_value, max_value)

This preserves the differentiability of the model. This will produce a result like below. (min and max values will be different)

Relu6

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	scarecrow
Solution 2	Jeremy Caney

'Differences between F.relu(X) and torch.max(X, 0)

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]