'How to control output Dimensions of pytorch ConvTranspose1d?


I'm currently building on a convolutional encoder-decoder network in pytorch using Conv1d Layers for the encoder and ConvTranspose1d layers for the decoder. Unfortionately the output dimensions of the decoder do not match the encoder.

How can I ensure decoder shapes match encoder shapes?

The code:

## Building the neural network
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class Net(nn.Module):
    def __init__(self):
      super(Net, self).__init__()

      
      self.conv11 = nn.Conv1d(1, 12, kernel_size=(8,13), stride=1)
      self.relu11 = nn.ReLU(inplace=False)
      self.batch11 = nn.BatchNorm2d(12)
      self.conv12 = nn.Conv1d(12, 16, (1,11), 1)
      self.relu12 = nn.ReLU(inplace=False)
      self.batch12 = nn.BatchNorm2d(16)
      self.conv13 = nn.Conv1d(16, 20, (1,9), 1)
      self.relu13 = nn.ReLU(inplace=False)
      self.batch13 = nn.BatchNorm2d(20)
      self.conv14 = nn.Conv1d(20, 24, (1,7), 1)
      self.relu14 = nn.ReLU(inplace=False)
      self.batch14 = nn.BatchNorm2d(24)
      self.conv15 = nn.Conv1d(24, 32, (1,7), 1)
      self.relu15 = nn.ReLU(inplace=False)
      self.batch15 = nn.BatchNorm2d(32)

      # ConvTranspose explained: https://medium.com/@marsxiang/convolutions-transposed-and-deconvolution-6430c358a5b6
      self.conv25 = nn.ConvTranspose1d(32, 24, (1,7), 1)
      self.relu25 = nn.ReLU(inplace=False)
      self.batch25 = nn.BatchNorm2d(24)
      self.conv24 = nn.ConvTranspose1d(24, 20, (1,9), 1) ### Problem Layer
      self.relu24 = nn.ReLU(inplace=False)
      self.batch24 = nn.BatchNorm2d(20)
      self.conv23 = nn.ConvTranspose1d(20, 16, (1,11), 1) ### Problem Layer
      self.relu23 = nn.ReLU(inplace=False)
      self.batch23 = nn.BatchNorm2d(16)
      self.conv22 = nn.ConvTranspose1d(16, 12, (1,13), 1) ### Problem Layer
      self.relu22 = nn.ReLU(inplace=False) 
      self.batch22 = nn.BatchNorm2d(12)
      self.conv21 = nn.ConvTranspose1d(12, 1, (1,129), 1)

    def forward(self, x):
      print("Forward pass")
      print(x.shape)
      x = self.batch11(self.relu11(self.conv11(x))) #First Layer
      print("Encoder")
      print(x.shape)
      x = self.batch12(self.relu12(self.conv12(x)))
      print(x.shape)
      x = self.batch13(self.relu13(self.conv13(x)))
      print(x.shape)
      x = self.batch14(self.relu14(self.conv14(x)))
      print(x.shape)
      shape14 = x.shape
      x = self.batch15(self.relu15(self.conv15(x)))
      print("Latent Space")
      print(x.shape)
      x = self.batch25(self.relu25(self.conv25(x)))
      print("Decoder")
      print(x.shape)
      x = self.batch24(self.relu24(self.conv24(x))) ### Problem Layer
      print("Problem Layer")
      print(x.shape)
      x = self.batch23(self.relu23(self.conv23(x))) ### Problem Layer
      print("Problem Layer")
      print(x.shape)
      x = self.batch22(self.relu22(self.conv22(x))) ### Problem Layer
      print(x.shape)
      x = self.conv21(x)
      print("Output Layer")
      print(x.shape)
      return x

net = Net()
print(net)

Creating dummy data and calculating a forward pass of the network

test_samples = np.random.rand(5,8,129) ##Dummy data
Z_samples = test_samples
print(Z_samples.shape)
print(Z_samples[0,:,:].shape)
inp = torch.from_numpy(Z_samples[0,:,:]).float()
print(inp.shape)
inp = torch.unsqueeze(inp, 0)
inp = torch.unsqueeze(inp, 0)
print(inp.shape)
out = net(inp)
print("Out Shape")
print(out.shape)

Console Output of above block:

(5, 8, 129)
(8, 129)
torch.Size([8, 129])
torch.Size([1, 1, 8, 129])
Forward pass
torch.Size([1, 1, 8, 129])
Encoder
torch.Size([1, 12, 1, 117])
torch.Size([1, 16, 1, 107])
torch.Size([1, 20, 1, 99])
torch.Size([1, 24, 1, 93])
Latent Space
torch.Size([1, 32, 1, 87])
Decoder
torch.Size([1, 24, 1, 93])  # Remark: This Layer-Output is fine
Problem Layer
torch.Size([1, 20, 1, 101]) # Remark: Here the last dimension should be 99 instead of 101
Problem Layer
torch.Size([1, 16, 1, 111]) # Remark: Here the last dimension should be 107 instead of 111
torch.Size([1, 12, 1, 123]) # Remark: Here the last dimension should be 117 instead of 123
Output Layer
torch.Size([1, 1, 1, 251]) # Remark: Here the last dimension should be 129 instead of 251
Out Shape
torch.Size([1, 1, 1, 251])

I found this threat recommending to use the "output_size" argument of ConvTranspose1d in the Forward-Pass. If I do so, I get a Index Error (shown in following image).

Index error when using output_size argument of ConvTranspose1d in the forward pass



Solution 1:[1]

To make "conv - transposed_conv" pair preserve input shape, conv and transposed_conv should have same parameters, so, each (spatial) shape-changing conv must be paired with equally parametrized transposed_conv (well, channels less restricted then spatial parameters(kernel, stride, padding) ), yours are not.

With setting up transposeds like this:

    self.conv25 = nn.ConvTranspose1d(32, 24, (1,7), 1)
    self.relu25 = nn.ReLU(inplace=False)
    self.batch25 = nn.BatchNorm2d(24)
    self.conv24 = nn.ConvTranspose1d(24, 20, (1,7), 1) ### Problem Layer
    self.relu24 = nn.ReLU(inplace=False)
    self.batch24 = nn.BatchNorm2d(20)
    self.conv23 = nn.ConvTranspose1d(20, 16, (1,9), 1) ### Problem Layer
    self.relu23 = nn.ReLU(inplace=False)
    self.batch23 = nn.BatchNorm2d(16)
    self.conv22 = nn.ConvTranspose1d(16, 12, (1,11), 1) ### Problem Layer
    self.relu22 = nn.ReLU(inplace=False) 
    self.batch22 = nn.BatchNorm2d(12)
    self.conv21 = nn.ConvTranspose1d(12, 1, (8,13), 1)

Result shape appears right (torch.Size([1, 1, 8, 129])).

If you need some independent latent space subnet, make it preserve its input shape too (as awhole).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alexey Birukov