'loaded model provides large loss, why not continued training?
It's my first time to run ML model in pytorch. My data is very large and I am trying to save the model after some iterations, and continue the training in case I the program crashed in the middle so i have to start over. So I am checking if the loaded model can continue the training. I trained the model twice (epoch in range (2), and save & load the model to continue the training (epoch in range(2,4). but I found (1) the model did not update. The loss form epoch 3 and 4 are identical (2) the new loss is very large. It seems that I did not save the model correctly, or did not load model. Thanks in advance!
print("Trainloss", Trainloss)
print("R-square", r_squared)
*Trainloss [1470600.5, 0.8635099530220032]
R-square [-10589209.551380618, -5.086684550040197]*
save_path="./savedmodel.pth"
EPOCH = epoch
TRAIN_LOSS = Trainloss
Rsquare=r_squared
loss=loss
torch.save({
'epoch': EPOCH,
'Rsquare': Rsquare,
'loss': loss,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'TRAIN_LOSS': Trainloss,
}, save_path)
device = torch.device("cuda")
optimizer = optim.Adam(model.parameters(), lr = lr)
model = NeuralNetwork()
lr = 1e-2
checkpoint = torch.load(save_path)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
loss = checkpoint['loss']
r_squared=checkpoint['Rsquare']
Trainloss=checkpoint['TRAIN_LOSS']
epoch=checkpoint['epoch']
model.to(device)
model.train()
print(Trainloss)
print(r_squared)
*trainloss: [1470600.5, 0.8635099530220032, 439699.84375, 439699.84375]
r2: [-10589209.551380618, -5.086684550040197, -3161013.534033724, -3161013.534033724]*
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
