'Calculating training and testing accuracy of LSTM

I am building an LSTM model with the following code and I wish to calculate the training and testing accuracies of the model. I am a novice in machine learning and the only method I know for calculating the accuracy is using sklearn's "accuracy score".

y_train = pd.Series(y_train)
lstm_model = Sequential()
lstm_model.add(Embedding(top_words, 32, input_length=req_length))
lstm_model.add(Flatten())
input = (req_length, 32)
lstm_model.add(Reshape(input))
lstm_model.add(LSTM(units = 50, return_sequences = True))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(256, activation='relu'))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(1, activation='sigmoid'))
lstm_model.compile(optimizer='adam', loss='binary_crossentropy', 
metrics=['accuracy'])
lstm = lstm_model.fit(X_train, y_train, epochs = 30, batch_size = 10)

To calculate y_pred, I wrote it as y_pred = lstm_model.predict(y_test). However, the accuracy score function on y_pred as its shape is (600, 401, 1).

What can I do regarding this or share some code?

Solution 1:^[1]

If you print lstm_model.summary(), you might see:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, 401, 32)           160000    
                                                                 
 flatten (Flatten)           (None, 12832)             0         
                                                                 
 reshape (Reshape)           (None, 401, 32)           0         
                                                                 
 lstm (LSTM)                 (None, 401, 50)           16600     
                                                                 
 dropout (Dropout)           (None, 401, 50)           0         
                                                                 
 dense (Dense)               (None, 401, 256)          13056     
                                                                 
 dropout_1 (Dropout)         (None, 401, 256)          0         
                                                                 
 dense_1 (Dense)             (None, 401, 1)            257       
                                                                 
=================================================================
Total params: 189,913
Trainable params: 189,913
Non-trainable params: 0
_________________________________________________________________

As we can notice, last Dense layer produces output of shape (None, 401, 1). This 401 number appears in the whole network and means the number of elements in the sequence (words). As far as I understand, it comes from your req_length variable.

Now let's look closer at this line of code:

lstm_model.add(LSTM(units = 50, return_sequences = True))

Here, you specify LSTM layer. But wait, what does return_sequences = True mean? According to the documentation:

return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence. Default: False.

What you set here depends on your task. Here, you basically tell LSTM layer to have as many outputs as the number of words in the sequence. This way, you have 401 vectors in the output from LSTM. However, as I see from your model architecture, you want to solve some binary classification task. In this case, it would be more logical to output only one vector from LSTM. Thus, I suggest setting this parameter to False:

lstm_model.add(LSTM(units = 50, return_sequences = False))

Compare new model architecture with the previous one:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, 401, 32)           160000    
                                                                 
 flatten (Flatten)           (None, 12832)             0         
                                                                 
 reshape (Reshape)           (None, 401, 32)           0         
                                                                 
 lstm (LSTM)                 (None, 50)                16600     
                                                                 
 dropout (Dropout)           (None, 50)                0         
                                                                 
 dense (Dense)               (None, 256)               13056     
                                                                 
 dropout_1 (Dropout)         (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 1)                 257       
                                                                 
=================================================================
Total params: 189,913
Trainable params: 189,913
Non-trainable params: 0
_________________________________________________________________

Now your LSTM layer outputs only one vector of size 50 for each batch element, not 401 vectors. Therefore, in the end, you don't have 401 values for each batch element; you have only one value which is a prediction for the input sequence.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Calculating training and testing accuracy of LSTM

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]