'How to fix "initial_lr not specified when resuming optimizer" error for scheduler?
In PyTorch I have configured SGD like this:
sgd_config = {
'params' : net.parameters(),
'lr' : 1e-7,
'weight_decay' : 5e-4,
'momentum' : 0.9
}
optimizer = SGD(**sgd_config)
My requirements are:
- Total epochs are 100
- Every 30 epochs learning rate is decreased by a factor of 10
- Decreasing learning rate will stop at 60 epochs
So for 100 epochs I will get two times a decrease of 0.1 of my learning rate.
I read about learning rate scheduler, available in torch.optim.lr_scheduler so I decided to try using that instead of manually adjusting the learning rate:
scheduler = lr_scheduler.StepLR(optimizer, step_size=30, last_epoch=60, gamma=0.1)
However I am getting
Traceback (most recent call last):
File "D:\Projects\network\network_full.py", line 370, in <module>
scheduler = lr_scheduler.StepLR(optimizer, step_size=30, last_epoch=90, gamma=0.1)
File "D:\env\test\lib\site-packages\torch\optim\lr_scheduler.py", line 367, in __init__
super(StepLR, self).__init__(optimizer, last_epoch, verbose)
File "D:\env\test\lib\site-packages\torch\optim\lr_scheduler.py", line 39, in __init__
raise KeyError("param 'initial_lr' is not specified "
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"
I read a post here and I still don't get how I would use the scheduler for my scenario. Maybe I am just not understanding the definition of last_epoch given that the documentation is very brief on this parameter:
last_epoch (int) – The index of last epoch. Default:
-1.
Since the argument is made available to the user and there is no explicit prohibition on using a scheduler for less epochs than the optimizer itself, I am starting to think it's a bug.
Solution 1:[1]
You have misunderstood the last_epoch argument and you are not using the correct learning rate scheduler for your requirements.
This should work:
optim.lr_scheduler.MultiStepLR(optimizer, [0, 30, 60], gamma=0.1, last_epoch=args.current_epoch - 1)
The last_epoch argument makes sure to use the correct LR when resuming training. It defaults to -1, so the epoch before epoch 0.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ShinyDemon |
