'How to fix "initial_lr not specified when resuming optimizer" error for scheduler?

In PyTorch I have configured SGD like this:

sgd_config = {
    'params' : net.parameters(),
    'lr' : 1e-7,
    'weight_decay' : 5e-4,
    'momentum' : 0.9
}
optimizer = SGD(**sgd_config)

My requirements are:

  • Total epochs are 100
  • Every 30 epochs learning rate is decreased by a factor of 10
  • Decreasing learning rate will stop at 60 epochs

So for 100 epochs I will get two times a decrease of 0.1 of my learning rate.

I read about learning rate scheduler, available in torch.optim.lr_scheduler so I decided to try using that instead of manually adjusting the learning rate:

scheduler = lr_scheduler.StepLR(optimizer, step_size=30, last_epoch=60, gamma=0.1)

However I am getting

Traceback (most recent call last):
  File "D:\Projects\network\network_full.py", line 370, in <module>
    scheduler = lr_scheduler.StepLR(optimizer, step_size=30, last_epoch=90, gamma=0.1)
  File "D:\env\test\lib\site-packages\torch\optim\lr_scheduler.py", line 367, in __init__
    super(StepLR, self).__init__(optimizer, last_epoch, verbose)
  File "D:\env\test\lib\site-packages\torch\optim\lr_scheduler.py", line 39, in __init__
    raise KeyError("param 'initial_lr' is not specified "
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

I read a post here and I still don't get how I would use the scheduler for my scenario. Maybe I am just not understanding the definition of last_epoch given that the documentation is very brief on this parameter:

last_epoch (int) – The index of last epoch. Default: -1.

Since the argument is made available to the user and there is no explicit prohibition on using a scheduler for less epochs than the optimizer itself, I am starting to think it's a bug.



Solution 1:[1]

You have misunderstood the last_epoch argument and you are not using the correct learning rate scheduler for your requirements.

This should work:

optim.lr_scheduler.MultiStepLR(optimizer, [0, 30, 60], gamma=0.1, last_epoch=args.current_epoch - 1)

The last_epoch argument makes sure to use the correct LR when resuming training. It defaults to -1, so the epoch before epoch 0.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ShinyDemon