'Testing my CNN on a small set of image but training has no effect
I constructed a CNN to recognize 9 classes of gestures in images of 224x224x3. I try to test its functionality by training it on 16 images and see if it overfits to 100 accuracy. Here is my network
import torch.nn as nn
class learn_gesture(nn.Module):
def __init__(self):
super(learn_gesture, self).__init__()
self.name = "gesture_learner"
self.conv1 = nn.Conv2d(in_channels=3, out_channels=20, kernel_size=5, stride=1, padding=2)
self.conv2 = nn.Conv2d(in_channels=20, out_channels=50, kernel_size=5, stride=1, padding=2)
self.conv3 = nn.Conv2d(in_channels=50, out_channels=100, kernel_size=5, stride=1, padding=2)
self.conv4 = nn.Conv2d(in_channels=100, out_channels=200, kernel_size=5, stride=1, padding=2)
self.conv5 = nn.Conv2d(in_channels=200, out_channels=400, kernel_size=5, stride=1, padding=2)
self.pool1 = nn.MaxPool2d(2,2)
self.pool2 = nn.MaxPool2d(2,2)
self.pool3 = nn.MaxPool2d(2,2)
self.pool4 = nn.MaxPool2d(2,2)
self.pool5 = nn.MaxPool2d(2,2)
self.fc1 = nn.Linear(7*7*400, 10000)
self.fc2 = nn.Linear(10000, 3000)
self.fc3 = nn.Linear(3000, 9)
def forward(self, x):
x = self.pool1(F.relu(self.conv1(x))) # gives 112*20
x = self.pool2(F.relu(self.conv2(x))) # gives 56*50
x = self.pool3(F.relu(self.conv3(x))) # gives 28*100
x = self.pool4(F.relu(self.conv4(x))) # gives 14*200
x = self.pool5(F.relu(self.conv5(x))) # gives 7*400
x = x.view(-1, 7*7*400)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return F.softmax(self.fc3(x), dim=1)
And here is the training code:
overfit_model = learn_gesture()
num_epochs = 200 #set it high so that it will converge
## loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(over_model.parameters(), lr=0.001, momentum=0.9) #optimizer is SGD with momentum
## set up some empty np arrays to store our result for plotting later
train_err = np.zeros(num_epochs)
train_loss = np.zeros(num_epochs)
################################################ train the network
for epoch in range(num_epochs):
total_train_loss = 0
total_train_err = 0
total_epoch = 0
for i, data in enumerate(smallLoader, 0):
inputs, labels = data
outputs = over_model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
optimizer.zero_grad()
corr = (determine_corr(outputs, labels)) # get a list of bool representing right or wrong predictions in the batch
total_train_err += corr.count(False)
total_train_loss += loss.item()
total_epoch += len(labels)
train_err[epoch] = float(total_train_err) / total_epoch
train_loss[epoch] = float(total_train_loss) / (i+1)
print(("Epoch {}: Train err: {}, Train loss: {}").format(
enter code hereepoch + 1,
train_err[epoch],
train_loss[epoch]))
The training has no effect, and both the accuracy and loss has no improvement either. I just absolutely can't figure out where the error is. Any help is greatly appreciated!
############### Update ##############
I got rid of the softmax in the forward function. Surprisingly, the performance of the model hasn't changed much. And I notice that some elements in the output now are negative and the elements across all classes do not add to 1. Is this supposed to happen? output:
tensor([[ 0.0165, -0.0041, 0.0043, 0.0017, 0.0238, 0.0329, -0.0265, -0.0224,
-0.0187],
[ 0.0163, -0.0044, 0.0036, 0.0028, 0.0248, 0.0334, -0.0268, -0.0218,
-0.0194],
[ 0.0161, -0.0046, 0.0041, 0.0019, 0.0240, 0.0333, -0.0266, -0.0223,
-0.0192],
[ 0.0190, -0.0044, 0.0035, 0.0015, 0.0244, 0.0322, -0.0267, -0.0223,
-0.0187],
[ 0.0174, -0.0048, 0.0033, 0.0021, 0.0251, 0.0328, -0.0257, -0.0225,
-0.0190],
[ 0.0175, -0.0041, 0.0033, 0.0031, 0.0241, 0.0329, -0.0264, -0.0222,
-0.0192],
[ 0.0168, -0.0042, 0.0033, 0.0022, 0.0251, 0.0335, -0.0269, -0.0225,
-0.0195],
[ 0.0163, -0.0047, 0.0037, 0.0030, 0.0243, 0.0336, -0.0265, -0.0227,
-0.0192],
[ 0.0165, -0.0043, 0.0038, 0.0026, 0.0242, 0.0337, -0.0264, -0.0222,
-0.0191],
[ 0.0163, -0.0051, 0.0038, 0.0016, 0.0236, 0.0338, -0.0258, -0.0223,
-0.0195],
[ 0.0173, -0.0037, 0.0038, 0.0018, 0.0236, 0.0322, -0.0269, -0.0225,
-0.0191],
[ 0.0174, -0.0044, 0.0031, 0.0019, 0.0241, 0.0334, -0.0266, -0.0224,
-0.0200],
[ 0.0164, -0.0038, 0.0034, 0.0029, 0.0245, 0.0342, -0.0269, -0.0225,
-0.0200],
[ 0.0173, -0.0046, 0.0036, 0.0021, 0.0245, 0.0328, -0.0264, -0.0221,
-0.0192],
[ 0.0168, -0.0046, 0.0034, 0.0025, 0.0248, 0.0336, -0.0262, -0.0222,
-0.0194],
[ 0.0166, -0.0051, 0.0033, 0.0015, 0.0234, 0.0331, -0.0270, -0.0218,
-0.0186]], grad_fn=<AddmmBackward>)
Epoch 199: Train err: 0.8125, Train loss: 2.1874701976776123
Solution 1:[1]
It seems that you are using a model named
overfit_modelwhere you passover_model.parameters()to the optimizer:optimizer = optim.SGD(over_model.parameters(), lr=0.001, momentum=0.9)Should be replaced with
ovrefit_model.parameters().You are setting your gradients to zeros right after you back propagate, where it should be done beforehand. So, the following lines:
loss.backward() optimizer.step() optimizer.zero_grad()Should be replaced with:
optimizer.zero_grad() loss.backward() optimizer.step()There is no need to call
F.softmaxinreturn F.softmax(self.fc3(x), dim=1)since you are using
nn.CrossEntropyLossthat callsF.cross_entropywhich natively bundleslog_softmaxbefore callingnll_lossreturn nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
Solution 2:[2]
First of all, you should remove softmax before 'nn.CrossEntropyLoss' as other answers discovered.
Now the actual problem you are on is, your neural CNN model is very deep but your dataset is very small. Under this condition, either you may need thousands of epoches, or even the model can't converge at all. If you can increase your dataset or use less deep CNN model, you can overcome the convergence problem you are facing now.
But you don't want to change those, here is my suggestion:
Quoting from here
The most basic method of hyper-parameter search is to do a grid search over the learning rate and batch size to find a pair which makes the network converge.
Let's talk about batch size. If your dataset is very small, model is too deep, then moderate batch size (i.e. 32 or 16) may not work. As you want to test a over fitting model, try with very small batch size (4 or 8). This will help to find a local maxima for small number of samples quickly.
If you have already tried this, try with increasing the learning rate.
Solution 3:[3]
It's not reducing because you are using CrossEntropyLoss after softmax, so basically you are just reducing the amount of gradient back flowing
Just remove the softmax, and it should work
(Will edit it to add the reason later)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Mushfiqur Rahman |
| Solution 3 | Karan Dhingra |
