'Pytorch transfer learning accuracy and lossess not improving. Resnet 50 and Cifar-10

I have been trying everything to fix this issue however my results are still the same, my validation accuracy, train_loss, val_loss are not improving. I have no idea what to do anymore.

I am currently using the resnet 50 pre-trained model on the Imagenet dataset. My normalization values are [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]. I am trying to finetune my model for the Cifar-10 dataset. I have frozen the gradient calculation for all layers except for the last layer as I need to finetune the FCL layers.

I was training the model for 10 epochs with a learning rate of 0.01. I have tried to use lr_scheduler, weight decay, gradient clipping, etc. I can’t seem to find the issue to this problem.

Let me know if you need more information

Results

Epoch [0], last_lr: 0.00100, train_loss: 1.7113, val_loss: 1.4291, val_acc: 0.6011
Epoch [1], last_lr: 0.00100, train_loss: 1.7026, val_loss: 1.4912, val_acc: 0.5179
Epoch [2], last_lr: 0.00100, train_loss: 1.7056, val_loss: 1.4993, val_acc: 0.5151
Epoch [3], last_lr: 0.00070, train_loss: 1.6603, val_loss: 1.5118, val_acc: 0.4944
Epoch [4], last_lr: 0.00070, train_loss: 1.6568, val_loss: 1.4137, val_acc: 0.5750
Epoch [5], last_lr: 0.00070, train_loss: 1.6601, val_loss: 1.4724, val_acc: 0.5010
Epoch [6], last_lr: 0.00049, train_loss: 1.6167, val_loss: 1.3665, val_acc: 0.6519
Epoch [7], last_lr: 0.00049, train_loss: 1.6277, val_loss: 1.3846, val_acc: 0.6340
Epoch [8], last_lr: 0.00049, train_loss: 1.6185, val_loss: 1.4177, val_acc: 0.5657
Epoch [9], last_lr: 0.00034, train_loss: 1.6047, val_loss: 1.3777, val_acc: 0.5945

Here are my training steps

@torch.no_grad()  
def evaluate(model, val_loader):      
  model.eval() 
  outputs = [validation_step(model,batch) for batch in val_loader]

    return validation_epoch_end(outputs)

#getting lr 
def get_lr(optimizer):

    for param_group in optimizer.param_groups:

        return param_group['lr']

def fit_one_cycle(epochs, max_lr, models, train_loader, val_loader ):

                  
    torch.cuda.empty_cache()

    history = []

# Set up cutom optimizer with weight decay

    optimizer = optim.Adam(model.parameters(), max_lr, weight_decay=0.1)

# Set up one-cycle learning rate scheduler

    sched = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.7)

for epoch in range(epochs):

        # Training Phase 

        model.train()

        

        train_losses = []

        lrs = []

        for batch in train_loader:

            optimizer.zero_grad()

            loss = training_step(model,batch)

            train_losses.append(loss)

            loss.backward()

            

            # Gradient clipping

            

            nn.utils.clip_grad_value_(model.parameters(), 0.01)

            

            optimizer.step()

            lrs.append(get_lr(optimizer))

        sched.step()

        # Validation phase
        result = evaluate(model, val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        result['lrs'] = lrs
        epoch_end(epoch, result)
        history.append(result)
    return history

Here are my helper functions to train my model

def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))


def training_step(model, batch):
        images, labels = batch 
        images, labels = images.to(device), labels.to(device)
        out = model(images)                  # Generate predictions
        loss =  nn.functional.cross_entropy(out, labels) # Calculate loss
        return loss
    
def validation_step(model, batch):
        images, labels = batch 
        images, labels = images.to(device), labels.to(device)
        out = model(images)                    # Generate predictions
        loss = nn.functional.cross_entropy(out, labels)   # Calculate loss
        acc = accuracy(out, labels)           # Calculate accuracy
        return {'val_loss': loss.detach(), 'val_acc': acc}
        
def validation_epoch_end( outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_accs = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
    
def epoch_end( epoch, result):
   print("Epoch [{}], last_lr: {:.5f}, train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(epoch,                                      result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_acc']))
 
#indentation for result['lrs'] is wrong, can't move it forward for some reason

I tried numerous techniques such as data augmentations, gradient clipping, lr scheduler, changing of lr, changing of trainning steps, nothing seems to ultimately work. I can't find any difference between my code and the code from documentations. I have no idea how to further optimize/improve my model.

I created a model before this which got a 75 accuracy on the validation set, I wanted to improve it by implementing transfer learning, but it seem to got worse this time. I was using the Cifar-10 in both cases



Solution 1:[1]

Surprisingly I had the same problem and was searching the Internet for an answer and found your post.

Later I have read that the problem is with CIFAR-10

CIFAR-10 is based on 32×32 images which isn't suitable for the ResNet50 architecture so one have to resize it to 224×224 before passing it to the network

So basically you should change your transforms to

import torchvision.transforms as transforms
transform = transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor(), transforms.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225])])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mahmoud Hussein