'How to define a specific keras layer weight as non-trainable?

Let's suppose we have a neural nets with three layers : Inputs > Hidden > Outputs and consider that the weigths between the Hidden and Outputs layers are : W, b where W is a matrix of shape (N, M). By default, all components of W and b are set as trainable in keras. I know how to set the entire W or b as non trainable like in the link below:

How to set parameters in keras to be non-trainable?

What I want is to be able to set only a specific component of W (for example) to be non trainable. For instance, If:

W = [[W11, W12]
     [W21, W22]]

Which can be rewritten in:

W = [W1, W2] with W1 = [W11, W12] and W2 = [W21, W22]

and all W1 and W2 are of type tf.Variable,

How to set for instance W1 as non trainable?

I looked for some other topics but non of them helps me to get what I want. Some examples of links are belows:

Link 1 : https://keras.io/guides/transfer_learning/

Link 2 : https://github.com/tensorflow/tensorflow/issues/47597

Can anyone help me to solve this?

Thank you in advance



Solution 1:[1]

The tensor W is stored as a single tf.Variable (not four variables w11, w12, w21, w22) and tf.Variable.trainable controls entire tensors, not sub tensors. Worse yet, inside a keras layer, all variables have the same trainable attribute, because they are controlled by the tf.keras.layers.Layer.trainable attribute.

To do what you want, you'd want two variables W1 and W2, each wrapped in a different instance of a layer. You'd apply each layer to the input, resulting in half the answer. Then you can concat to get the complete answer.

Solution 2:[2]

You can create your own layers in keras. This will help you to customize your weights within your layers, e.g., whether they are trainable or not.

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages
import tensorflow as tf
from keras.layers import *
from keras.models import *

# Your custom layer
class Linear(Layer):
    def __init__(self, units=32,**kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=False
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

In Linear, the weights w are trainable and the bias b are not. Here, I am creating a training loop for dummy data to visualize the weight updating.

batch_size=10
input_shape=(batch_size,5,5) 


## model
model = Sequential()
model.add(Input(shape=input_shape))
model.add(Linear(units=4,name='my_linear_layer'))
model.add(Dense(1))


## dummy dataset
x = tf.random.normal(input_shape) # dummy input
y = tf.ones((batch_size,1)) # dummy output

## loss functions and optimizer
loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2)


### training loop 
epochs = 3
for epoch in range(epochs):
  print("\nStart of epoch %d" % (epoch,))

  tf.print(model.get_layer('my_linear_layer').get_weights())

  # Open a GradientTape to record the operations run
  # during the forward pass, which enables auto-differentiation.
  with tf.GradientTape() as tape:

    # Run the forward pass of the layer.
    # The operations that the layer applies
    # to its inputs are going to be recorded
    # on the GradientTape.
    logits = model(x, training=True)  # Logits for this minibatch

    # Compute the loss value for this minibatch.
    loss_value = loss_fn(y, logits)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss.
  grads = tape.gradient(loss_value, model.trainable_weights)

  # Run one step of gradient descent by updating
  # the value of the variables to minimize the loss.
  optimizer.apply_gradients(zip(grads, model.trainable_weights))

This loop returns the following result,

Start of epoch 0
[array([[ 0.08920084, -0.04294993,  0.06111819,  0.08334437],
       [-0.0369432 , -0.05014499,  0.0305218 , -0.07486793],
       [-0.01227043,  0.09460627, -0.0560123 ,  0.01324316],
       [-0.00255878,  0.00214959, -0.02924518,  0.04721532],
       [-0.05532415, -0.02014978, -0.06785563, -0.07330619]],
      dtype=float32),
 array([ 0.02154647,  0.05153348, -0.00128291, -0.06794706], dtype=float32)]

Start of epoch 1
[array([[ 0.08961578, -0.04327399,  0.06152926,  0.08325274],
       [-0.03829437, -0.04908974,  0.02918325, -0.07456956],
       [-0.01417133,  0.09609085, -0.05789544,  0.01366292],
       [-0.00236284,  0.00199657, -0.02905108,  0.04717206],
       [-0.05536905, -0.02011472, -0.06790011, -0.07329627]],
      dtype=float32),
 array([ 0.02154647,  0.05153348, -0.00128291, -0.06794706], dtype=float32)]

Start of epoch 2
[array([[ 0.09001605, -0.04358549,  0.06192534,  0.08316355],
       [-0.03960795, -0.04806747,  0.02788337, -0.07427685],
       [-0.01599812,  0.09751251, -0.05970317,  0.01406999],
       [-0.00217021,  0.00184666, -0.02886046,  0.04712913],
       [-0.05540781, -0.02008455, -0.06793848, -0.07328764]],
      dtype=float32),
 array([ 0.02154647,  0.05153348, -0.00128291, -0.06794706], dtype=float32)]

As you see while the weights w are updating, the bias b stays constant.

Solution 3:[3]

So i'm trying to solve a similar problem at the moment. What you would need to do is first use the functional API of keras. Then put all the weights that you want to be trainable into one layers and all the weights you want to be non-trianable into another layer. Have the previous layer input into both these layers. Then what you can do is use the tensorflow concatenate layer to combine the layers back together. So say you had a hidden layer with 5 neurons, 3 where you wanted them to be trainable and 2 where you wanted them to be non-trainable.

    X = Dense(5, activation='relu')(X) #previous layer

    Y = Dense(3, activation='relu',name='trainable_layer')(X) 
    Z = Dense(2, activation='relu',name='non_trainable_layer')(X)
    Z.trainable = False

    X = Concatenate()([Y, Z])

    X = Dense(5, activation='relu')(X) #layer after layer with mixed trainable weights

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yaoshiang
Solution 2 Prefect
Solution 3