'Follow-up question regarding a Keras model issue

So about a week ago I posted this question: Issues running a Keras model with custom layers. The suggestion there was to try to make this question smaller and try to debug it myself. I believe I've managed to done something like that, but I still have some problems with it. Because of how long the original post is, I'm making a new question.

Here's the current, much simplified code I've been trying to debug (you can probably ignore the custom callbacks):

embedding_layer = Embedding(len(word_index) + 1,
                                    embedding_dim,
                                    weights=[embedding_matrix],
                                    input_length=self.MAX_SENTENCE_LENGTH,
                                    trainable=True,
                                    mask_zero=True,
                                    name='sent_embed')

sentence_input = Input(shape=(self.MAX_SENTENCE_LENGTH,), dtype='int32', name='sent_input')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(GRU(EMBEDDING_DIM, return_sequences=True), name='word_lstm')(embedded_sequences)
l_att = AttLayer(name='word_attention')(l_lstm)
dense = Dense(EMBEDDING_DIM, activation='relu', name='dense_relu')(l_att)
preds = Dense(2, activation='softmax', name='dense_final')(dense)
model = Model(inputs=sentence_input, outputs=preds)
optimize = RMSprop(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=optimize)

model.fit(encoded_train_x[:, 0], y=train_y[:],
          validation_data=(encoded_val_x[:, 0], val_y[:]),
          batch_size=batch_size, epochs=epochs, verbose=1,
          callbacks=callbacks)

I've been trying to run this code on Keras 2.2 with TF 1.13, and Keras 2.4 with TF 2.4, with different issues on either version.

The input shape errors I'm getting on the former version occur when my batch_size =/= MAX_SENTENCE_LENGTH, which, firstly, doesn't make any sense to me, and secondly, is a problem since I'm trying to train this model on a data set that has 1359 samples, which I need to split between the train and val sets - this makes it kinda difficult to set a constant batch_size in the model input. I could exclude some of the samples in order to split it into batches of equal size, but that seems like a bad workaround I'd rather avoid.

On the latter combination of Keras and TF versions, I'm getting a different error, this time with the crossentropy:

ValueError: Can not squeeze dim[1], expected a dimension of 1, got 20 for '{{node categorical_crossentropy/weighted_loss/Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](Cast)' with input shapes: [?,20].

Could someone explain why these issues occur, and how to fix them? I'd like to keep the question theoretical, but if you want to try debugging this I can provide the code for generating the embedding_matrix along with sample inputs.

EDIT:

embedding_dim = 100   # always, because I'm using 100d glove embeddings
MAX_SENTENCE_LENGTH = 20  # I need to be able to increase this
batch_size = 20  # only works if this is equal to MAX_SENTENCE_LENGTH


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source