'Pytorch transfer learning accuracy and lossess not improving. Resnet 50 and Cifar-10
I have been trying everything to fix this issue however my results are still the same, my validation accuracy, train_loss, val_loss are not improving. I have no idea what to do anymore.
I am currently using the resnet 50 pre-trained model on the Imagenet dataset. My normalization values are [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]. I am trying to finetune my model for the Cifar-10 dataset. I have frozen the gradient calculation for all layers except for the last layer as I need to finetune the FCL layers.
I was training the model for 10 epochs with a learning rate of 0.01. I have tried to use lr_scheduler, weight decay, gradient clipping, etc. I can’t seem to find the issue to this problem.
Let me know if you need more information
Results
Epoch [0], last_lr: 0.00100, train_loss: 1.7113, val_loss: 1.4291, val_acc: 0.6011
Epoch [1], last_lr: 0.00100, train_loss: 1.7026, val_loss: 1.4912, val_acc: 0.5179
Epoch [2], last_lr: 0.00100, train_loss: 1.7056, val_loss: 1.4993, val_acc: 0.5151
Epoch [3], last_lr: 0.00070, train_loss: 1.6603, val_loss: 1.5118, val_acc: 0.4944
Epoch [4], last_lr: 0.00070, train_loss: 1.6568, val_loss: 1.4137, val_acc: 0.5750
Epoch [5], last_lr: 0.00070, train_loss: 1.6601, val_loss: 1.4724, val_acc: 0.5010
Epoch [6], last_lr: 0.00049, train_loss: 1.6167, val_loss: 1.3665, val_acc: 0.6519
Epoch [7], last_lr: 0.00049, train_loss: 1.6277, val_loss: 1.3846, val_acc: 0.6340
Epoch [8], last_lr: 0.00049, train_loss: 1.6185, val_loss: 1.4177, val_acc: 0.5657
Epoch [9], last_lr: 0.00034, train_loss: 1.6047, val_loss: 1.3777, val_acc: 0.5945
Here are my training steps
@torch.no_grad()
def evaluate(model, val_loader):
model.eval()
outputs = [validation_step(model,batch) for batch in val_loader]
return validation_epoch_end(outputs)
#getting lr
def get_lr(optimizer):
for param_group in optimizer.param_groups:
return param_group['lr']
def fit_one_cycle(epochs, max_lr, models, train_loader, val_loader ):
torch.cuda.empty_cache()
history = []
# Set up cutom optimizer with weight decay
optimizer = optim.Adam(model.parameters(), max_lr, weight_decay=0.1)
# Set up one-cycle learning rate scheduler
sched = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.7)
for epoch in range(epochs):
# Training Phase
model.train()
train_losses = []
lrs = []
for batch in train_loader:
optimizer.zero_grad()
loss = training_step(model,batch)
train_losses.append(loss)
loss.backward()
# Gradient clipping
nn.utils.clip_grad_value_(model.parameters(), 0.01)
optimizer.step()
lrs.append(get_lr(optimizer))
sched.step()
# Validation phase
result = evaluate(model, val_loader)
result['train_loss'] = torch.stack(train_losses).mean().item()
result['lrs'] = lrs
epoch_end(epoch, result)
history.append(result)
return history
Here are my helper functions to train my model
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))
def training_step(model, batch):
images, labels = batch
images, labels = images.to(device), labels.to(device)
out = model(images) # Generate predictions
loss = nn.functional.cross_entropy(out, labels) # Calculate loss
return loss
def validation_step(model, batch):
images, labels = batch
images, labels = images.to(device), labels.to(device)
out = model(images) # Generate predictions
loss = nn.functional.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss.detach(), 'val_acc': acc}
def validation_epoch_end( outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def epoch_end( epoch, result):
print("Epoch [{}], last_lr: {:.5f}, train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_acc']))
#indentation for result['lrs'] is wrong, can't move it forward for some reason
I tried numerous techniques such as data augmentations, gradient clipping, lr scheduler, changing of lr, changing of trainning steps, nothing seems to ultimately work. I can't find any difference between my code and the code from documentations. I have no idea how to further optimize/improve my model.
I created a model before this which got a 75 accuracy on the validation set, I wanted to improve it by implementing transfer learning, but it seem to got worse this time. I was using the Cifar-10 in both cases
Solution 1:[1]
Surprisingly I had the same problem and was searching the Internet for an answer and found your post.
Later I have read that the problem is with CIFAR-10
CIFAR-10 is based on 32×32 images which isn't suitable for the ResNet50 architecture so one have to resize it to 224×224 before passing it to the network
So basically you should change your transforms to
import torchvision.transforms as transforms
transform = transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor(), transforms.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225])])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mahmoud Hussein |
