'How contrastive loss work intuitively in siamese network

I am having issue in getting clear concept of contrastive loss used in siamese network.

Here is pytorch formula

torch.mean((1-label) * torch.pow(euclidean_distance, 2) +
                                      (label) * torch.pow(torch.clamp(margin - euclidean_distance, min=0.0), 2))

where margin=2.

If we convert this to equation format, it can be written as

(1-Y)*D^2 + Y* max(m-d,0)^2

Y=0, if both images are from same class Y=1, if both images are from different class

What i think, if images are from same class the distance between embedding should decrease. and if images are from different class, the distance should increase.

I am unable to map this concept to contrastive loss.

Let say, if Y is 1 and distance value is larger, the first part become zero (1-Y), and second also become zero, because it should choose whether m-d or 0 is bigger. So the loss is zero which does not make sense. Can you please help me to understand this



Solution 1:[1]

If the distance of a negative sample is greater than the specified margin, it should be already separable from a positive sample. Therefore, there is no benefit in pushing it farther away.

For details please check this blog post, where the concept of "Equilibrium" gets explained and why the Contrastive Loss makes reaching this point easier.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Knipser