'How do i interpret the weights, in this branched tf model?

I did not find a suitable question that answers this scenario, if it is already answered someplace else, please feel free to point me to the question.

Problem

I have a tensorflow model (At the end of the question i have a minimal reproducible code, for anyone to test it out. ) with these specs

  1. Input shape : [(None, 3, 47, 12)], the output shape can be ignored for the scope of this question..
  2. Some place in the building of the model, i branch off, selecting only one channel from axis 1, So the branch will receive inputs which are one channel. enter image description here
  3. I do not use the sequential api.
  4. I print out the weight and biases shape, for confirmation.

I am not able to understand why the weight matrices of a convolution layer in the new branch still has the shape ((3, .. .. )).

Example:

My Model:

# Declaring a couple of layers
Conv_1 = tf.keras.layers.Conv1D(
                                75,
                                activation='swish',
                                kernel_size=3,
                                strides=2,
                                padding='valid',
                                name='Conv_1',
                                # data_format = 'channels_first',
                                )(inputs)
Av_P_1 = tf.keras.layers.AveragePooling2D(
                                          pool_size=(1,3),
                                          strides=(1,2),
                                          padding='valid',
                                          data_format='channels_first',
                                          name='Av_P_1'
                                         )(Conv_1)
Layer_N_1 = tf.keras.layers.LayerNormalization(
                                               name='Layer_N_1'
                                              )(Av_P_1)



Dense_1 = tf.keras.layers.Dense(
                                70,
                                activation='swish',
                                name='Dense_1'
                                )(Layer_N_1)
Layer_N_2 = tf.keras.layers.LayerNormalization(
                                               name='Layer_N_2'
                                              )(Dense_1)

# Ustacking here
s,t,r = tf.unstack(
                   Layer_N_2,
                   axis=1
                   )

# Branching here
Conv_2 = tf.keras.layers.Conv1D(50,
                                activation='swish',
                                kernel_size=3,
                                strides=2,
                                padding='valid',
                                name='Conv_2',
                                )(tf.concat([s], axis = 1)) #  <-------- Branching here
Av_P_2 = tf.keras.layers.AveragePooling1D(
                                          pool_size=3,
                                          strides=1,
                                          padding='valid',
                                          name = 'Av_P_2'
                                          )(Conv_2)
Layer_N_3 = tf.keras.layers.LayerNormalization(
                                            name='Layer_N_3'
                                            )(Av_P_2)

LSTM_1 = tf.keras.layers.LSTM(35,
                              return_sequences=True,
                              name='LSTM_1'
                              )(Layer_N_3)

test_model = tf.keras.Model(inputs=inputs, outputs=[LSTM_1])
print(test_model.summary())
Model: "model_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Input (InputLayer)           [(None, 3, 47, 12)]       0         
_________________________________________________________________
Conv_1 (Conv1D)              (None, 3, 23, 75)         2775      
_________________________________________________________________
Av_P_1 (AveragePooling2D)    (None, 3, 23, 37)         0         
_________________________________________________________________
Layer_N_1 (LayerNormalizatio (None, 3, 23, 37)         74        
_________________________________________________________________
Dense_1 (Dense)              (None, 3, 23, 70)         2660      
_________________________________________________________________
Layer_N_2 (LayerNormalizatio (None, 3, 23, 70)         140       
_________________________________________________________________
tf.unstack_19 (TFOpLambda)   [(None, 23, 70), (None, 2 0         
_________________________________________________________________
tf.identity_4 (TFOpLambda)   (None, 23, 70)            0         
_________________________________________________________________
Conv_2 (Conv1D)              (None, 11, 50)            10550     
_________________________________________________________________
Av_P_2 (AveragePooling1D)    (None, 9, 50)             0         
_________________________________________________________________
Layer_N_3 (LayerNormalizatio (None, 9, 50)             100       
_________________________________________________________________
LSTM_1 (LSTM)                (None, 9, 35)             12040     
=================================================================
Total params: 28,339
Trainable params: 28,339
Non-trainable params: 0
_________________________________________________________________

Shapes of each element (s,t,r) after the unstack:

(TensorShape([None, 23, 70]),
 TensorShape([None, 23, 70]),
 TensorShape([None, 23, 70]))

Shapes of the weights and biases only for the conv layers:


###### Name: ######
Layer Name: Conv_1
  Weights:
  Conv_1/kernel:0
  (3, 12, 75)

  Biases:
  Conv_1/bias:0
  (75,)

###### Name: ######
Layer Name: Conv_2
  Weights:
  Conv_2/kernel:0
  (3, 70, 50)  < ---- why does the weight matrix still have 3 channels

  Biases:
  Conv_2/bias:0
  (50,)

My interpretations

  1. There is no way to "disconnect" the layers outputs like i wish to because somehow the graph disconnects (do not know how tf evaluates model connectivity) .. hence the weighs are initialized but the subsequent updates are 0. This might not be the case because tensorflow lists them as trainable variables meaning they will be optimized.
  2. I am doing something wrong.. i have researched on tensorflows preprocessing layers, used lambda functions and experimented with unstack vs split..

Please feel free to ask any further questions, because i know i might not have been clear enough for everybody.

P.S. I am aware of how channels_first and channels_last are to be used.



Solution 1:[1]

So following this answer, the (3, .., ..) in the weight matrix does not correspond to the no of channels, but corresponds to the filter size. In the example above the filter sizes used in the conv1D layer above is 3, if the filter size is changed to 5 for example the weight matrix would look like this..

###### Name: ######
Layer Name: Conv_1

  Weights:
  Conv_1/kernel:0
  (5, 12, 75)

  Biases:
  Conv_1/bias:0
  (75,)

###### Name: ######
Layer Name: Conv_2

  Weights:
  Conv_2/kernel:0
  (5, 70, 50)

  Biases:
  Conv_2/bias:0
  (50,)
  • Please also note that the weight matrices are different from the layer outputs and are represented as follows (Kernel_size, Features(which are the no of units in the prev layer), Conv_units(No of conv units in the current layer) ).
  • The outputs however depend on the input shape. channels in on axis 1 are preserved by default as extended batch dimension read example 2 from this.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AvidJoe