'Error with dimensionality when fitting a stateful RNN

I am fitting a stateful RNN with embedding layer to perform binary classification. I am having some confusion with the batch_size and batch_shape needed in the function APIs.

xtrain_padded.shape = (9600, 1403); xtest_padded.shape = (2400, 1403); ytest.shape = (2400,)
input_dim = size of tokenizer word dictionary
output_dim = 100 from GloVe_100d embeddings
number of SimpleRNN layer units = 200

h0: initial hidden states sampled from random uniform. 
h0 object has the same shape as RNN layer hidden states obtained when return_state = True.

The model structure:

batch_size = 2400  # highest common factor of xtrain and xtest
inp= Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out= Embedding(input_dim, output_dim, input_length= input_length, 
                         weights= [Emat], trainable= False, name= 'embedding')(inp)

rnn= SimpleRNN(200, return_sequences= True, return_state= True, stateful= True,
              batch_size= (batch_size, input_length, 100), name= 'simpleRNN')

h_0 = tf.random.uniform((batch_size, input_length, 200))
rnn_out, rnn_state = rnn(emb_out, initial_state=h0)
mod_out= Dense(1, activation= 'sigmoid')(rnn_out)
# Extract the y_t's and h_t's:
model = Model(inputs=inp, outputs=[mod_out, rnn_out])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           [(2400, 1403)]            0         
_________________________________________________________________
embedding (Embedding)        (2400, 1403, 100)         4348900   
_________________________________________________________________
simpleRNN (SimpleRNN)        [(2400, 1403, 200), (2400 60200     
_________________________________________________________________
dense_3 (Dense)              (2400, 1403, 1)           201       

No issue when I fit the test data to model using the model API:

mod_out_allsteps, rnn_ht= model(xte_pad)  # Same as the 2 items from model.predict(xte_pad) 
print(mod_out_allsteps.shape, rnn_ht.shape) 
>> (2400, 1403, 1) (2400, 1403, 200)

However it raised a ValueError regarding unequal dimensions when I use model.fit.

model.fit(xte_pad, yte, epochs =1, batch_size = batch_size, verbose = 1)
>>
    ValueError: Dimensions must be equal, but are 2400 and 1403 for '{{node binary_crossentropy_1/mul}} = Mul[T=DT_FLOAT](binary_crossentropy_1/Cast, binary_crossentropy_1/Log)' with input shapes: [2400,1], [2400,1403,200].

The error seems to suggest the model has confused the returned hidden states rnn_ht shaped [2400,1403,200] with something else when fitting the data. However I am going to need these states for computing the gradients on the initial hidden states i.e. enter image description here for t = 1,..., 1403.

I am confused with the dimensions in stateful RNNs:

  1. If stateful = True, are we constructing the model based on one mini-batch?
    i.e. the first index in Output Shape of each layer will be the batch_size?
  2. What is the batch_shape to be set in the first layer (Input)? Have I set it right?

Thank you in advance for helping with the error and my confusion!


Update:

batch_size = 2400  # highest common factor of xtrain and xtest
input_length = 1403
output_dim = 100
inp= tf.keras.layers.Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out=  tf.keras.layers.Embedding(500, output_dim, input_length= input_length, trainable= False, name= 'embedding')(inp)

rnn=  tf.keras.layers.SimpleRNN(200, return_sequences= True, return_state= False, stateful= True,
              batch_size= (batch_size, input_length, 100), name= 'simpleRNN')
rnn_ht= rnn(emb_out)  # hidden states at all steps 
print(rnn_ht.shape)
>>> 
(2400, 1403, 200)

mod_out= Dense(1, activation= 'sigmoid')(Flatten()(rnn_ht))
# Extract the y_t's and h_t's:
model =  tf.keras.Model(inputs=inp, outputs=[mod_out, rnn_ht])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           [(2400, 1403)]            0         
_________________________________________________________________
embedding (Embedding)        (2400, 1403, 100)         50000     
_________________________________________________________________
simpleRNN (SimpleRNN)        (2400, 1403, 200)         60200     
_________________________________________________________________
flatten_4 (Flatten)          (2400, 280600)            0         
_________________________________________________________________
dense_4 (Dense)              (2400, 1)                 280601    


mod_out_allsteps, rnn_ht= model_ht(xte_pad)   
print(mod_out_allsteps.shape, rnn_ht.shape)  
>>> 
(2400, 1) (2400, 1403, 200)

But the error with ```model.fit``` persists.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source