'How to fix the error where the target batch size does not match when I use CrossEntropyLoss function?

I am working on a trainning task with CNN. When I created the loss function with CrossEntropyLoss and trained the dataset, the error reminded me that the batch size is not matched. This is the main code for trainning:

net = SimpleConvolutionalNetwork()

train_history, val_history = train(net, batch_size=32, n_epochs=10, learning_rate=0.001)

plot_losses(train_history, val_history)

This is the neuron network code:

class SimpleConvolutionalNetwork(nn.Module):

  # Q: why the scope of input not changed after relu??
  
  def __init__(self) -> None:
      super(SimpleConvolutionalNetwork, self).__init__()

      # define convolutional filting layer(3 grids) and output size(18 channels)
      self.conv1 = nn.Conv2d(3, 18, kernel_size=3, stride=1, padding=1)

      # define pooling layer with max-pooling function
      self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)

      # define FCL and output layer by Linear function
      self.fc1 = nn.Linear(18*16*16, 64)
      self.fc2 = nn.Linear(64, 10)

  # Q: where the pooling layer??

  def forward(self, x):
    # input shape: 3(grids) * 32 * 32(32*32 is the scope of each grid)
    # filted by conv1 defined in the construction function
    # then relu the filted x
    x = F.relu(self.conv1(x))

    # now let 18*32*32 -> 18*16*16
    x = x.view(-1, 18*16*16)

    # two step for 18*16*16(totally 4608) -> 64
    # output by FC firstly, then relu again the output
    x = F.relu(self.fc1(x))

    # 64 -> 10 finally
    x = self.fc2(x)
    return x

In the train function, the error place is at the construction of loss function. Because it is a very long context, the main part is showed below:

def train(net, batch_size, n_epochs, learning_rate):
...
    # load the training dataset
  train_loader = get_train_loader(batch_size)

  # get validation dataset
  val_loader = get_val_loader(batch_size)

  # set batch size
  n_minibatches = len(train_loader)

  # set loss function and validation test checking
  criterion, optimizer = createLossAndOptimizer(net, learning_rate)

  train_history = []
  val_history = []

  training_start_time = time.time()
  best_error = np.inf
  best_model_path = "best_model_path"

  # GPU if possible
  net = net.to(device)

  for epoch in range(n_epochs):

    running_loss = 0.0
    print_every = n_minibatches
    start_time = time.time()
    total_train_loss = 0.0

    # step1: training the datasets
    for i, (inputs, labels) in enumerate(train_loader):
      inputs, labels = inputs.to(device), labels.to(device)

      optimizer.zero_grad()

      # forward + backward + optimize
      outputs = net(inputs)
      loss = criterion(outputs, labels)
      loss.backward()
      optimizer.step()

      #print statistics
      running_loss += loss.item()
      total_train_loss += loss.item()

      # print every 10th of epoch
      if (i + 1) % (print_every + 1) == 0:    
        print("Epoch {}, {:d}% \t train_loss: {:.2f} took: {:.2f}s".format(
          epoch + 1, int(100 * (i + 1) / n_minibatches), running_loss / print_every,
          time.time() - start_time))
        running_loss = 0.0
        start_time = time.time()

    train_history.append(total_train_loss / len(train_loader))
...

the loss construction funciton and dataset loading are like this:

def createLossAndOptimizer(net, learning_rate=0.001):

  # define a cross-entropy loss function:
  criterion = nn.CrossEntropyLoss()

  # optimizer include three parameters: net, learning rate, and 
  # momentum rate for validate the dataset from over-fitting(default
  # value is 0.9)

  optimizer = opt.Adam(net.parameters(), lr=learning_rate)
  return criterion, optimizer

def get_train_loader(batch_size):
  return th.utils.data.DataLoader(train_set,batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)

def get_val_loader(batch_size):
  return th.utils.data.DataLoader(train_set,batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)

However, the error reminded me that the input batch size is more than the target batch size:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-07b692e7a2bb> in <module>()
    173 net = SimpleConvolutionalNetwork()
    174 
--> 175 train_history, val_history = train(net, batch_size=32, n_epochs=10, learning_rate=0.001)
    176 
    177 plot_losses(train_history, val_history)

3 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   2844     if size_average is not None or reduce is not None:
   2845         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
   2847 
   2848 

ValueError: Expected input batch_size (128) to match target batch_size (32).

I primarily thought that I mistakely set the incorrect parameters because of the 'labels' which is size 4. But I don't know how to fix it. Thanks for answering.



Solution 1:[1]

In forward method of SimpleConvolutionalNetwork after applying conv1, tensor x has shape of (batch_size, 18, 32, 32). So when doing x = x.view(-1, 18 * 16 * 16) shape of x turns to (batch_size * 4, 18 * 16 * 16) and because fully-connected layers applyed further don't change this new batch size, output has shape (batch_size * 4, 10). My suggestion would be using pooling right after convolution, like:

 x = F.relu(self.conv1(x))  # after that x will have shape (batch_size, 18, 32, 32) 
 x = self.pool(x)           # after that x will have shape (batch_size, 18, 16, 16)

That way forward will return tensor with shape (batch_size, 10) and batch size mismatch error won't occur.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 draw