'Keras model.predict() output for regression does not match label vector

My data contains 520 time series, each of length 2297:

  • X_train = numpy.ndarray of shape (338, 2297, 1)
  • X_val = numpy.ndarray of shape (85, 2297, 1)
  • X_test = numpy.ndarray of shape (97, 2297, 1)
  • y_train = numpy.ndarray of shape (338,)
  • y_val = numpy.ndarray of shape (85,)
  • y_test = numpy.ndarray of shape (97,)

My goal is to predict the number of a certain type of anomaly pattern in each of the 97 test time series by using regression in a convolutional neural network.

Model:

model = keras.Sequential()
model.add(Conv1D(32, 2, activation='relu', input_shape=(2297, 1)))
model.add(Dropout(0.1))
model.add(Conv1D(256, 2, activation='relu'))
model.add(Dropout(0.2))
model.add(Conv1D(64, 2, activation='relu'))
model.add(Dense(1))

model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'mse', metrics = ['mae', 'mse'])
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_val, y_val), verbose=1)

Evaluation and Prediction:

loss, mae, mse = model.evaluate(X_test, y_test, verbose=1)
y_hat = model.predict(X_test)

The problem is that y_hat = numpy.ndarray of shape (97, 2294, 1), meaning that it contains 97 lists with 2294 entries each. I expected 97 numbers, each giving me the predicted count of anomaly patterns for one of the 97 time series in the test data.

Why does model.predict() return such a strange shape?

And how can I get a single integer prediction for each of the time series in the test data?

With this FAQ I could not find the answer to the problem.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source