'How padding works in PyTorch

Normally if I understood well PyTorch implementation of the Conv2D layer, the padding parameter will expand the shape of the convolved image with zeros to all four sides of the input. So, if we have an image of shape (6,6) and set padding = 2 and strides = 2 and kernel = (5,5), the output will be an image of shape (1,1). Then, padding = 2 will pad with zeroes (2 up, 2 down, 2 left and 2 right) resulting in a convolved image of shape (5,5)

However when running the following script :

import torch
from torch import nn
x = torch.ones(1,1,6,6)
y = nn.Conv2d(in_channels= 1, out_channels=1, 
              kernel_size= 5, stride = 2, 
              padding = 2,)(x)

I got the following outputs:

y.shape
==> torch.Size([1, 1, 3, 3]) ("So shape of convolved image = (3,3) instead of (5,5)")

y[0][0]
==> tensor([[0.1892, 0.1718, 0.2627, 0.2627, 0.4423, 0.2906],
    [0.4578, 0.6136, 0.7614, 0.7614, 0.9293, 0.6835],
    [0.2679, 0.5373, 0.6183, 0.6183, 0.7267, 0.5638],
    [0.2679, 0.5373, 0.6183, 0.6183, 0.7267, 0.5638],
    [0.2589, 0.5793, 0.5466, 0.5466, 0.4823, 0.4467],
    [0.0760, 0.2057, 0.1017, 0.1017, 0.0660, 0.0411]],
   grad_fn=<SelectBackward>)

Normally it should be filled with zeroes. I'm confused. Can anyone help please?



Solution 1:[1]

The input is padded, not the output. In your case, the conv2d layer will apply a two-pixel padding on all sides just before computing the convolution operation.

For illustration purposes,

>>> weight = torch.rand(1, 1, 5, 5)
  • Here we apply a convolution with padding=2:

    >>> x = torch.ones(1,1,6,6)
    >>> F.conv2d(x, weight, stride=2, padding=2)
    tensor([[[[ 5.9152,  8.8923,  6.0984],
              [ 8.9397, 14.7627, 10.8613],
              [ 7.2708, 12.0152,  9.0840]]]])
    
  • And we don't use any padding but instead apply it ourselves on the input:

    >>> x_padded = F.pad(x, (2,)*4)
    tensor([[[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
              [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
              [0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
              [0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
              [0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
              [0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
              [0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
              [0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
              [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
              [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]]])
    
    >>> F.conv2d(x_padded, weight, stride=2)
    tensor([[[[ 5.9152,  8.8923,  6.0984],
              [ 8.9397, 14.7627, 10.8613],
              [ 7.2708, 12.0152,  9.0840]]]])
    

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1