'Calculating training and testing accuracy of LSTM
I am building an LSTM model with the following code and I wish to calculate the training and testing accuracies of the model. I am a novice in machine learning and the only method I know for calculating the accuracy is using sklearn's "accuracy score".
y_train = pd.Series(y_train)
lstm_model = Sequential()
lstm_model.add(Embedding(top_words, 32, input_length=req_length))
lstm_model.add(Flatten())
input = (req_length, 32)
lstm_model.add(Reshape(input))
lstm_model.add(LSTM(units = 50, return_sequences = True))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(256, activation='relu'))
lstm_model.add(Dropout(0.2))
lstm_model.add(Dense(1, activation='sigmoid'))
lstm_model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
lstm = lstm_model.fit(X_train, y_train, epochs = 30, batch_size = 10)
To calculate y_pred, I wrote it as y_pred = lstm_model.predict(y_test)
. However, the accuracy score function on y_pred as its shape is (600, 401, 1)
.
What can I do regarding this or share some code?
Solution 1:[1]
If you print lstm_model.summary()
, you might see:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 401, 32) 160000
flatten (Flatten) (None, 12832) 0
reshape (Reshape) (None, 401, 32) 0
lstm (LSTM) (None, 401, 50) 16600
dropout (Dropout) (None, 401, 50) 0
dense (Dense) (None, 401, 256) 13056
dropout_1 (Dropout) (None, 401, 256) 0
dense_1 (Dense) (None, 401, 1) 257
=================================================================
Total params: 189,913
Trainable params: 189,913
Non-trainable params: 0
_________________________________________________________________
As we can notice, last Dense layer produces output of shape (None, 401, 1)
. This 401 number appears in the whole network and means the number of elements in the sequence (words). As far as I understand, it comes from your req_length
variable.
Now let's look closer at this line of code:
lstm_model.add(LSTM(units = 50, return_sequences = True))
Here, you specify LSTM layer. But wait, what does return_sequences = True
mean? According to the documentation:
return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence. Default:
False
.
What you set here depends on your task. Here, you basically tell LSTM layer to have as many outputs as the number of words in the sequence. This way, you have 401 vectors in the output from LSTM. However, as I see from your model architecture, you want to solve some binary classification task. In this case, it would be more logical to output only one vector from LSTM. Thus, I suggest setting this parameter to False
:
lstm_model.add(LSTM(units = 50, return_sequences = False))
Compare new model architecture with the previous one:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, 401, 32) 160000
flatten (Flatten) (None, 12832) 0
reshape (Reshape) (None, 401, 32) 0
lstm (LSTM) (None, 50) 16600
dropout (Dropout) (None, 50) 0
dense (Dense) (None, 256) 13056
dropout_1 (Dropout) (None, 256) 0
dense_1 (Dense) (None, 1) 257
=================================================================
Total params: 189,913
Trainable params: 189,913
Non-trainable params: 0
_________________________________________________________________
Now your LSTM layer outputs only one vector of size 50 for each batch element, not 401 vectors. Therefore, in the end, you don't have 401 values for each batch element; you have only one value which is a prediction for the input sequence.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |