'Why is the convolutional filter flipped in convolutional neural networks? [closed]

I don't understand why there is the need to flip filters when using convolutional neural networks.

According to the lasagne documentation,

flip_filters : bool (default: True)

Whether to flip the filters before sliding them over the input, performing a convolution (this is the default), or not to flip them and perform a correlation. Note that for some other convolutional layers in Lasagne, flipping incurs an overhead and is disabled by default – check the documentation when using learned weights from another layer.

What does that mean? I never read about flipping filters when convolving in any neural network book. Would someone clarify, please?



Solution 1:[1]

Firstly, since CNNs are trained from scratch instead of human-designed, if the flip operation is necessary, the learned filters would be the flipped one and the cross-correlation with the flipped filters is implemented. Secondly, flipping is neccessary in 1D time-series processing, since the past inputs impact the current system output given the "current" input. But in 2D/3D image spatial convolution, there is not "time" concept, then not "past" input and its impact on "now", therefore, we don't need to consider the relationship of "signal" and "system", and there is only the relationship of "signal"(image patch) and "signal"(image patch), which means we only need cross-correlation instead of convolution (although DL borrow this concept from signal processing). Therefore, the flip operation is actually not needed. (I guess.)

Solution 2:[2]

I never read about flipping filters when convolving in any neural network book.

You can try a simple experiment. Take an image having the centermost pixel as value 1 and all other pixels with value 0. Now take any filter smaller than the image (let us say a 3 by 3 filter with values from 1-9). Now do a simple correlation instead of convolution. You end up with the flipped filter as the output after the operation.

Now flip the filter yourself and then do the same operation. You obviously end up with the original filter as the output.

The second operation somehow seems neater. It is like multiplying with a 1 and returning the same value. However the first one is not necessarily wrong. It works most of the times even though it may not have nice mathematical properties. After all, why would the program care about whether the operation is associative or not. It just does the job which it is told to do. Moreover the filter could be symmetrical..flipping it returns the same filter so correlation operation and convolution operation return the same output.

Is there a case where these mathematical properties help? Well sure, they do! If (ab)c is not equal to a(bc), then I wouldn't be able to combine 2 filters and then apply the result on an image. To clarify, imagine I have 2 filters a,b and an image c. I would have to first apply 'b' on the image 'c' and then 'a' on the above result in case of correlation. In case of convolution, I could just do 'a b' first and then apply the result on the image 'c'. If I have a million images to process, the efficiencies gained due to combining the filters 'a' and 'b' start becoming obvious.

Every single mathematical property that a convolution satisfies gives certain benefits and hence if we have a choice (& we certainly do) we should prefer convolutions to correlations. The only difference between them is - in convolution we flip the filter before doing the multiplication operation and in correlation - we directly do the multiplication operation.

Applying convolution satisfies the mathematician inside all of us and also gives us some tangible benefits as well.

Though nowadays feature engineering in images is done end-to-end completely by Mrs DL itself and we need not even bother about it, there are other traditional image operations that may need these kind of operations.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Allohvk