'Pytorch many-to-many time series LSTM always predicts the mean

I want to create an LSTM model using pytorch that takes multiple time series and creates predictions of all of them, a typical "many-to-many" LSTM network.

I am able to achieve what I want in keras. I create a set of data with three variables which are simply linearly spaced with some gaussian noise. Training the keras model I get a prediction 12 steps ahead that is reasonable.

When I try the same thing in pytorch the, model will always predict the mean of the input data. This is confirmed when looking at the loss during training I can see that the model never seems to perform better than just predicting the mean.

TL;DR; The question is: How can I achieve the same thing in pytorch as in the keras example in the gist below?

Full working examples are available here https://gist.github.com/jonlachmann/5cd68c9667a99e4f89edc0c307f94ddb

The keras network is defined as

model = Sequential()
model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(100, activation='relu'))
model.add(Dense(n_features))
model.compile(optimizer='adam', loss='mse')

and the pytorch network is

# Define the pytorch model
class torchLSTM(torch.nn.Module):
    def __init__(self, n_features, seq_length):
        super(torchLSTM, self).__init__()
        self.n_features = n_features
        self.seq_len = seq_length
        self.n_hidden = 100  # number of hidden states
        self.n_layers = 1  # number of LSTM layers (stacked)

        self.l_lstm = torch.nn.LSTM(input_size=n_features,
                                    hidden_size=self.n_hidden,
                                    num_layers=self.n_layers,
                                    batch_first=True)
        # according to pytorch docs LSTM output is
        # (batch_size,seq_len, num_directions * hidden_size)
        # when considering batch_first = True
        self.l_linear = torch.nn.Linear(self.n_hidden * self.seq_len, 3)

    def init_hidden(self, batch_size):
        # even with batch_first = True this remains same as docs
        hidden_state = torch.zeros(self.n_layers, batch_size, self.n_hidden)
        cell_state = torch.zeros(self.n_layers, batch_size, self.n_hidden)
        self.hidden = (hidden_state, cell_state)

    def forward(self, x):
        batch_size, seq_len, _ = x.size()

        lstm_out, self.hidden = self.l_lstm(x, self.hidden)
        # lstm_out(with batch_first = True) is
        # (batch_size,seq_len,num_directions * hidden_size)
        # for following linear layer we want to keep batch_size dimension and merge rest
        # .contiguous() -> solves tensor compatibility error
        x = lstm_out.contiguous().view(batch_size, -1)
        return self.l_linear(x)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source