'Pytorch many-to-many time series LSTM always predicts the mean
I want to create an LSTM model using pytorch that takes multiple time series and creates predictions of all of them, a typical "many-to-many" LSTM network.
I am able to achieve what I want in keras. I create a set of data with three variables which are simply linearly spaced with some gaussian noise. Training the keras model I get a prediction 12 steps ahead that is reasonable.
When I try the same thing in pytorch the, model will always predict the mean of the input data. This is confirmed when looking at the loss during training I can see that the model never seems to perform better than just predicting the mean.
TL;DR; The question is: How can I achieve the same thing in pytorch as in the keras example in the gist below?
Full working examples are available here https://gist.github.com/jonlachmann/5cd68c9667a99e4f89edc0c307f94ddb
The keras network is defined as
model = Sequential()
model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(100, activation='relu'))
model.add(Dense(n_features))
model.compile(optimizer='adam', loss='mse')
and the pytorch network is
# Define the pytorch model
class torchLSTM(torch.nn.Module):
def __init__(self, n_features, seq_length):
super(torchLSTM, self).__init__()
self.n_features = n_features
self.seq_len = seq_length
self.n_hidden = 100 # number of hidden states
self.n_layers = 1 # number of LSTM layers (stacked)
self.l_lstm = torch.nn.LSTM(input_size=n_features,
hidden_size=self.n_hidden,
num_layers=self.n_layers,
batch_first=True)
# according to pytorch docs LSTM output is
# (batch_size,seq_len, num_directions * hidden_size)
# when considering batch_first = True
self.l_linear = torch.nn.Linear(self.n_hidden * self.seq_len, 3)
def init_hidden(self, batch_size):
# even with batch_first = True this remains same as docs
hidden_state = torch.zeros(self.n_layers, batch_size, self.n_hidden)
cell_state = torch.zeros(self.n_layers, batch_size, self.n_hidden)
self.hidden = (hidden_state, cell_state)
def forward(self, x):
batch_size, seq_len, _ = x.size()
lstm_out, self.hidden = self.l_lstm(x, self.hidden)
# lstm_out(with batch_first = True) is
# (batch_size,seq_len,num_directions * hidden_size)
# for following linear layer we want to keep batch_size dimension and merge rest
# .contiguous() -> solves tensor compatibility error
x = lstm_out.contiguous().view(batch_size, -1)
return self.l_linear(x)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
