'Problems with recurrent neural net working with time steps on Keras
I am trying to design a recurrent classificatory network with Keras. I have analyzed key characteristics of the frames of a video, and from them I want to identify when certain events occur during the video.
Specifically, I have a matrix (30 x 2) for each frame, which represents the positions of various given objects. From these positions, I would like the network to detect 4 different events, as well as in which frames they occur.
As an example, suppose I have the position of 30 cars in each frame already detected, and I want the network to learn to detect the frames in which:
- a car stops
- a car starts
- two cars collide
- a car turns
In each frame, one or none of these events can occur (category 0), but not more than one.
Something remarkable is that to identify these 4 events it is necessary to know both the data of the previous frames and those of the later ones. For example, to know that two cars collide, it is necessary to know that beforehand both were in motion, as well as that after the collision neither moves.
Following this example, and just to clarify, suppose I have a sample of 100 frames, in which there is a crash at frames 4 and 75, a stop at 12, a start at 37, and turns at 3, 30, and 60. It would have an input of 100x30x2, and an output of 100x1.
After several hours, I get the feeling that I am not understanding something well in the way of indicating to Keras how he is the model.
So far I have been trying the following, with variations in the number of LSTM layers and the number of classification neurons:
model = keras.Sequential()
model.add(layers.LSTM(100, input_shape=(30, 2)))
model.add(layers.Dense(16, activation = 'relu'))
model.add(layers.Dense(5, activation = 'sigmoid'))
model.summary()
model.compile(loss='sparse_categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])
I have also tried to introduce the variation
model3.add(layers.LSTM(100, input_shape=(30, 2), return_sequences=True))
so that you take into account that not only is the final output valid, but if I don't add a flatten it doesn't work, and I deduce that I don't understand the matter well.
Edit 1:
After your advice, I have now the following model:
I start with the dataset, stored in an input xp and an output yp. Here I print the sizes of both variables:
xp.shape, yp.shape
((203384, 25, 2), (203384, 1))
Then I'm encoding yp with keras.utils, and I change the shape of each input element from a (25x2) matrix to a (1x50) vector:
n = len(xp)
yp_encoded = keras.utils.to_categorical(yp)
xp_reshaped = xp[0:n,:].reshape(n,1,50)
print(n, xp_reshaped.shape, yp_encoded.shape)
203384, (203384, 1, 50), (203384, 5)
After, I define the model as we talked, with just LSTM layers
batch_size = 10
model = keras.Sequential()
model.add(layers.LSTM(100, batch_input_shape=(batch_size, 1, 50), activation = 'relu', return_sequences = True, stateful=True))
model.add(layers.LSTM(5, stateful=True, activation = 'softmax'))
model.summary()
Model: "sequential_140"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_184 (LSTM) (10, 1, 100) 58800
lstm_185 (LSTM) (10, 5) 2120
=================================================================
Total params: 60,920
Trainable params: 60,920
Non-trainable params: 0
So, as I understand, I have a LSTM model with an input of (batch_size, 1, 100) elements, that I should fit with an output of (batch_size, 5).
Then, I do the model compilation, and the fit:
model.compile(loss='categorical_crossentropy', optimizer = 'Adam', metrics = ['SparseCategoricalAccuracy'])
model.fit(xp_reshaped, yp_encoded, epochs = 5, batch_size = batch_size, shuffle = False)
I get the following error:
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 5 for '{{node Squeeze}} = Squeeze[T=DT_FLOAT, squeeze_dims=[-1]](IteratorGetNext:1)' with input shapes: [10,5].
Solution 1:[1]
You're trying to solve classification problem - classify frames as events with classes: start, stop, turn, collide and the 0 class (nothing happens). You've chosen the right approach - LSTMs, though your archtitecture is not good.
All your layers should be LSTMs with stateful=True and return_sequences=True, you should transform your target variable to one-hot encoding, set last layer's activation to softmax, it's output shape will be (n_frames, 4).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | desertnaut |
