'Testing my CNN on a small set of image but training has no effect

I constructed a CNN to recognize 9 classes of gestures in images of 224x224x3. I try to test its functionality by training it on 16 images and see if it overfits to 100 accuracy. Here is my network

    import torch.nn as nn
    class learn_gesture(nn.Module):
        def __init__(self):
            super(learn_gesture, self).__init__()
            self.name = "gesture_learner"
            self.conv1 = nn.Conv2d(in_channels=3, out_channels=20, kernel_size=5, stride=1, padding=2)                 
            self.conv2 = nn.Conv2d(in_channels=20, out_channels=50, kernel_size=5, stride=1, padding=2)
            self.conv3 = nn.Conv2d(in_channels=50, out_channels=100, kernel_size=5, stride=1, padding=2)
            self.conv4 = nn.Conv2d(in_channels=100, out_channels=200, kernel_size=5, stride=1, padding=2)
            self.conv5 = nn.Conv2d(in_channels=200, out_channels=400, kernel_size=5, stride=1, padding=2)                
            self.pool1 = nn.MaxPool2d(2,2)
            self.pool2 = nn.MaxPool2d(2,2)
            self.pool3 = nn.MaxPool2d(2,2)
            self.pool4 = nn.MaxPool2d(2,2)
            self.pool5 = nn.MaxPool2d(2,2)
            self.fc1 = nn.Linear(7*7*400, 10000)
            self.fc2 = nn.Linear(10000, 3000)
            self.fc3 = nn.Linear(3000, 9)
    
        def forward(self, x):
            x = self.pool1(F.relu(self.conv1(x))) # gives 112*20
            x = self.pool2(F.relu(self.conv2(x))) # gives 56*50
            x = self.pool3(F.relu(self.conv3(x))) # gives 28*100
            x = self.pool4(F.relu(self.conv4(x))) # gives 14*200
            x = self.pool5(F.relu(self.conv5(x))) # gives 7*400
            x = x.view(-1, 7*7*400)
            x = F.relu(self.fc1(x))
            x = F.relu(self.fc2(x))
            return F.softmax(self.fc3(x), dim=1)

And here is the training code:

    overfit_model = learn_gesture()
    num_epochs = 200   #set it high so that it will converge
    ## loss function and optimizer
    criterion = nn.CrossEntropyLoss()    
    optimizer = optim.SGD(over_model.parameters(), lr=0.001, momentum=0.9)       #optimizer is SGD with momentum

    ## set up some empty np arrays to store our result for plotting later
    train_err = np.zeros(num_epochs)
    train_loss = np.zeros(num_epochs)
    ################################################ train the network
    for epoch in range(num_epochs):
        total_train_loss = 0
        total_train_err = 0
        total_epoch = 0
        for i, data in enumerate(smallLoader, 0):
            inputs, labels = data
            outputs = over_model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            corr = (determine_corr(outputs, labels))    # get a list of bool representing right or wrong predictions in the batch
            total_train_err += corr.count(False)        
            total_train_loss += loss.item()
            total_epoch += len(labels)
        train_err[epoch] = float(total_train_err) / total_epoch
        train_loss[epoch] = float(total_train_loss) / (i+1)
        print(("Epoch {}: Train err: {}, Train loss: {}").format(
                   enter code hereepoch + 1,
                   train_err[epoch],
                   train_loss[epoch]))

The training has no effect, and both the accuracy and loss has no improvement either. I just absolutely can't figure out where the error is. Any help is greatly appreciated!

############### Update ##############

I got rid of the softmax in the forward function. Surprisingly, the performance of the model hasn't changed much. And I notice that some elements in the output now are negative and the elements across all classes do not add to 1. Is this supposed to happen? output:

tensor([[ 0.0165, -0.0041,  0.0043,  0.0017,  0.0238,  0.0329, -0.0265, -0.0224,
     -0.0187],
    [ 0.0163, -0.0044,  0.0036,  0.0028,  0.0248,  0.0334, -0.0268, -0.0218,
     -0.0194],
    [ 0.0161, -0.0046,  0.0041,  0.0019,  0.0240,  0.0333, -0.0266, -0.0223,
     -0.0192],
    [ 0.0190, -0.0044,  0.0035,  0.0015,  0.0244,  0.0322, -0.0267, -0.0223,
     -0.0187],
    [ 0.0174, -0.0048,  0.0033,  0.0021,  0.0251,  0.0328, -0.0257, -0.0225,
     -0.0190],
    [ 0.0175, -0.0041,  0.0033,  0.0031,  0.0241,  0.0329, -0.0264, -0.0222,
     -0.0192],
    [ 0.0168, -0.0042,  0.0033,  0.0022,  0.0251,  0.0335, -0.0269, -0.0225,
     -0.0195],
    [ 0.0163, -0.0047,  0.0037,  0.0030,  0.0243,  0.0336, -0.0265, -0.0227,
     -0.0192],
    [ 0.0165, -0.0043,  0.0038,  0.0026,  0.0242,  0.0337, -0.0264, -0.0222,
     -0.0191],
    [ 0.0163, -0.0051,  0.0038,  0.0016,  0.0236,  0.0338, -0.0258, -0.0223,
     -0.0195],
    [ 0.0173, -0.0037,  0.0038,  0.0018,  0.0236,  0.0322, -0.0269, -0.0225,
     -0.0191],
    [ 0.0174, -0.0044,  0.0031,  0.0019,  0.0241,  0.0334, -0.0266, -0.0224,
     -0.0200],
    [ 0.0164, -0.0038,  0.0034,  0.0029,  0.0245,  0.0342, -0.0269, -0.0225,
     -0.0200],
    [ 0.0173, -0.0046,  0.0036,  0.0021,  0.0245,  0.0328, -0.0264, -0.0221,
     -0.0192],
    [ 0.0168, -0.0046,  0.0034,  0.0025,  0.0248,  0.0336, -0.0262, -0.0222,
     -0.0194],
    [ 0.0166, -0.0051,  0.0033,  0.0015,  0.0234,  0.0331, -0.0270, -0.0218,
     -0.0186]], grad_fn=<AddmmBackward>)
Epoch 199: Train err: 0.8125, Train loss: 2.1874701976776123


Solution 1:[1]

  1. It seems that you are using a model named overfit_model where you pass over_model.parameters() to the optimizer:

    optimizer = optim.SGD(over_model.parameters(), lr=0.001, momentum=0.9)
    

    Should be replaced with ovrefit_model.parameters().

  2. You are setting your gradients to zeros right after you back propagate, where it should be done beforehand. So, the following lines:

         loss.backward()
         optimizer.step()
         optimizer.zero_grad()
    

    Should be replaced with:

         optimizer.zero_grad()
         loss.backward()
         optimizer.step()
    
  3. There is no need to call F.softmax in

    return F.softmax(self.fc3(x), dim=1)
    

    since you are using nn.CrossEntropyLoss that calls F.cross_entropy which natively bundles log_softmax before calling nll_loss

    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
    

Solution 2:[2]

First of all, you should remove softmax before 'nn.CrossEntropyLoss' as other answers discovered.

Now the actual problem you are on is, your neural CNN model is very deep but your dataset is very small. Under this condition, either you may need thousands of epoches, or even the model can't converge at all. If you can increase your dataset or use less deep CNN model, you can overcome the convergence problem you are facing now.

But you don't want to change those, here is my suggestion:
Quoting from here

The most basic method of hyper-parameter search is to do a grid search over the learning rate and batch size to find a pair which makes the network converge.

Let's talk about batch size. If your dataset is very small, model is too deep, then moderate batch size (i.e. 32 or 16) may not work. As you want to test a over fitting model, try with very small batch size (4 or 8). This will help to find a local maxima for small number of samples quickly.

If you have already tried this, try with increasing the learning rate.

Solution 3:[3]

It's not reducing because you are using CrossEntropyLoss after softmax, so basically you are just reducing the amount of gradient back flowing

Just remove the softmax, and it should work

(Will edit it to add the reason later)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Mushfiqur Rahman
Solution 3 Karan Dhingra