'Training loss is increasing in CNN?

I am in the process of training my first CNN to solve a multi-class classification problem. I am feeding in images of animals corresponding to one of 182 classes, however I have ran into some issues. Firstly my code appears to get stuck on optimiser.step(), it has been calculating this for roughly 30 minutes. Secondly my training loss is increasing:

EPOCH: 0 BATCH: 1999 LOSS: 1.5790680234357715
EPOCH: 0 BATCH: 3999 LOSS: 2.9340945997834207

If any one would be able to provide some guidance that would be greatly appreciated. Below is my code

#loading data
train_data = dataset.get_subset(
    "train",
    transform=transforms.Compose(
        [transforms.Resize((448, 448)), transforms.ToTensor()]
    ),
)

train_loader = get_train_loader("standard", train_data, batch_size=16)

#definind model
class ConvNet(nn.Module):

  def __init__(self):
    super(ConvNet, self).__init__()
    self.conv1 = nn.Conv2d(3, 6, 3, 1)
    self.conv2 = nn.Conv2d(6, 16, 3, 3)
    self.fc1 = nn.Linear(37*37*16, 120)
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 182)

  def forward(self, X):
    X = F.relu(self.conv1(X))
    X = F.max_pool2d(X, 2, 2)
    X = F.relu(self.conv2(X))
    X = F.max_pool2d(X, 2, 2)
    X = torch.flatten(X, 1)
    X = F.relu(self.fc1((X)))
    X = F.relu(self.fc2((X)))
    X = self.fc3(X)
    return F.log_softmax(X, dim=1)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(modell.parameters(), lr=0.001)

import time

start_time = time.time()

#VARIABLES  (TRACKER)
epochs = 2
train_losses = []
test_losses = []
train_correct = []
test_correct = []

# FOR LOOP EPOCH
for i in range(epochs):
  trn_corr = 0
  tst_corr = 0

  running_loss = 0.0
  #TRAIN
  for b, (X_train, Y_train, meta) in enumerate(train_loader):
    
    b+=1 #batch starts at 1

    #zero parameter gradients
    optimizer.zero_grad()

    # pass training to model as float (later compute loss)
    output = modell(X_train.float())

    #Calculate the loss of outputs with respect to ground truth values
    loss = criterion(output, Y_train)

    #Backpropagate the loss through the network
    loss.backward()

    #perform parameter update based on the current gradient
    optimizer.step()

    predicted = torch.max(output.data, 1)[1]


    batch_corr = (predicted == Y_train).sum() # True (1) or False (0)
    trn_corr += batch_corr

    running_loss += loss.item()

    if b%2000 == 1999:
      print(f"EPOCH: {i} BATCH: {b} LOSS: {running_loss/2000}")
      running_loss = 0.0

train_losses.append(loss)
train_correct.append(trn_corr)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Training loss is increasing in CNN?

Sources

Related Questions