'How does OpenCV's DFT-based filter2D actually work?

I have issues comprehending how the filter2D method of the the OpenCV library actually works. The "standard" direct version for small kernel sizes is clear, but for larger kernel sizes "(~11 x 11 or larger) the function uses the DFT-based algorithm", according to the documentation.

Which DFT-based algorithm? There is no scientific reference given and I wasn't able to determine which algorithm is used in particular. I am aware of how a DFT works, so I recon both, input image and kernel are transformed into a spectral representation where the convolution is conducted and then back again? If this is the general idea I would still like some additional background information.



Solution 1:[1]

The technique used to process convolutions using the DFT is generally known as "overlap-add". It is appropriate when the kernel is large, but still quite a bit smaller than the signal/image that you're applying it to.

In 2D, given an MxM kernel (and usually M would be rounded up to a power of 2):

  1. The image is divided into MxM blocks.
  2. The blocks and the kernel are padded out with zeros to 2Mx2M size.
  3. FFT convolution is used to convolve each block with the kernel. Each convolution produces a 2Mx2M result, but because the non-zero parts of the inputs are only MxM, the convolution doesn't wrap around. The same kernel FFT can be reused for all blocks.
  4. The 2Mx2M results from each MxM block are then added back together in an overlapping fashion to produce the resulting image.

This technique has been generally known in the signal processing field for a long time, and I don't have a reference to its origin. There's also a variation known as "overlap-save".

A google search for "2d overlap add convolution" yields somewhat satisfying results.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Matt Timmermans