'What is the right way to gradually unfreeze layers in neural network while learning with tensorflow?
I use transfer leaning with efficientnet_B0, and what im trying to do is to gradually unfreeze layers while network is learning. At first, I train 1 dense layer on top of whole network, while every other layer is frozen. I use this code to freeze layers:
for layer in model_base.layers[:-2]:
layer.trainable = False
then I unfreeze the whole model and freeze the exact layers I need using this code:
model.trainable = True
for layer in model_base.layers[:-13]:
layer.trainable = False
Everything works fine. I model.compile one more time and it starts to train from where it left, great. But then, when I unfreeze all layers one more time with
model.trainable = True
and try to do fine-tuning, my model start to learn from the scratch.
I tried different approaches and ways to fix this, but nothing seems to work. I tried to use layer.training = False and layer.trainable = False for all batch_normalization layers in the model too, but it doesn't help either.
Solution 1:[1]
This tends to be application-specific and not every problem can benefit from retraining the whole neural network.
my model start to learn from the scratch
While this is most likely not the case (weights are not reinitialized), it can definitely seem like that. Your model has been fine-tuned to some other task and now you are forcing it to retrain itself to do something different.
If you are observing behavior like that, the most likely cause is that you are simply using a large learning rate which will destroy those fine-tuned weights of the original model.
Retraining the whole model as you have described (the final step) should be done very, very carefully with very small learning rate (I have seen instances where Adam with 10^-8 learning rate was too much).
My advice is to keep lowering learning rate until it starts improving instead of damaging the weights but this may lead to such a small learning rate that it will be of no practical use.
Solution 2:[2]
In addition to the previous answer, I would like to point out one very overlooked factor: that the freezing/unfreezing is also dependent on the problem you are trying to solve, i.e.
- On the level of similarity of your own dataset and of the dataset on which the network was pre-trained.
- The dimension of the new dataset.
You should consult the next diagram prior to opting for a decision
Moreover, note that if you are constrained by the hardware, you can opt for leaving some of the layers completely frozen, since in this way you have a smaller number of trainable parameters.
Picture taken from here (although I remember having seen it in several blogs): https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
Solution 3:[3]
I've encountered this problem before. It seems that if I construct my model with Sequential API, the network will start learning from scratch when I set base_model.trainable = True. But if I create my model with Functional API, it seems that everything is okay. The way I create my model is like the one in this official tutorial https://www.tensorflow.org/tutorials/images/transfer_learning
Solution 4:[4]
The way you freeze and unfreeze your layers is correct and that is how it is done on the official website :
Setting layer.trainable to False moves all the layer's weights from trainable to non-trainable.
From https://keras.io/guides/transfer_learning/
As discussed on the other answers, the problem you encounter is indeed theorical and has nothing to do with the way you programmed it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Matus Dubrava |
| Solution 2 | |
| Solution 3 | |
| Solution 4 | SashimiDélicieux |
