'Why SparseCategoricalCrossentropy is not working with this machine learning model?

I have a .csv database file which looks like this:

              Day   Hour  N1  N2  N3  N4  N5  ...  N14  N15  N16  N17  N18  N19  N20
0      1996-03-18  15:00   4   9  10  16  21  ...   48   62   66   68   73   76   78
1      1996-03-19  15:00   6  12  15  19  28  ...   63   64   67   69   71   75   77
2      1996-03-21  15:00   2   4   6   7  15  ...   52   54   69   72   73   75   77
3      1996-03-22  15:00   3   8  15  17  19  ...   49   60   61   64   67   68   75
4      1996-03-25  15:00   2  10  11  14  18  ...   55   60   61   66   67   75   79
...           ...    ...  ..  ..  ..  ..  ..  ...  ...  ...  ...  ...  ...  ...  ...
13596  2022-01-04  22:50  17  18  22  26  27  ...   64   65   71   72   73   76   80
13597  2022-01-05  15:00   1   5   8  14  15  ...   47   54   59   67   70   72   76
13598  2022-01-05  22:50   6   7  14  15  16  ...   54   55   59   61   70   71   80
13599  2022-01-06  15:00   9  10  11  17  28  ...   51   55   65   67   72   76   78
13600  2022-01-06  22:50   1   2   6   9  11  ...   51   52   54   67   68   73   75

I have found this article: https://machinelearningmastery.com/how-to-develop-convolutional-neural-network-models-for-time-series-forecasting/

But I am trying to develop a modified version of that 1D CNN model by using softmax function on the last layer and SparseCategoricalCrossentropy() as loss function and also by adding new functions to that code making it different.

This is my code so far and the model I am trying to build and use:

# multivariate output 1d cnn example
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  # or any {'0', '1', '2'}
import warnings

warnings.filterwarnings('ignore')
import pandas as pd
# multivariate output 1d cnn example
from numpy import array
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import *
from tensorflow.keras.losses import *
from tensorflow.keras.layers import *
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.callbacks import ModelCheckpoint


# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
    def on_epoch_end (self, epoch, logs={}):
        optimizer = self.model.optimizer
        lr = K.eval(optimizer.lr)
        Epoch_count = epoch + 1
        print('\n', "Epoch:", Epoch_count, ', Learning Rate: {:.7f}'.format(lr))


printlr = printlearningrate()


# split a multivariate sequence into samples
def split_sequences (sequences, n_steps):
    X, y = list(), list()
    for i in range(len(sequences)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the dataset
        if end_ix > len(sequences) - 1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequences[i:end_ix, :], sequences[end_ix, :]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)


df = pd.read_csv('DrawsDB.csv')

print(df)
# df['Time'] = df[['Day', 'Hour']].agg(' '.join, axis=1)
df.insert(0, 'Time', df[['Day', 'Hour']].agg(' '.join, axis=1))
df.drop(columns=['Day', 'Hour'], inplace=True)
df.set_index('Time', inplace=True)
print(df)

numpy_array = df.to_numpy()

print(type(numpy_array))
print(numpy_array)

# choose a number of time steps
n_steps = 10
# convert into input/output
X, y = split_sequences(numpy_array, n_steps)
print(X.shape, y.shape)

# the dataset knows the number of features, e.g. 2
n_features = X.shape[2]
# Reduce learning rate when nothing happens to lower more the loss:
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.9888888888888889,
                              patience=10, min_lr=0.0000001, verbose=1)

epochs = 10
# saving best model every epoch with ModelCheckpoint:
checkpoint_filepath = 'C:\\Path\\To\\Saved\\CheckPoint\\model\\'
model_checkpoint_callback = ModelCheckpoint(
    filepath=checkpoint_filepath,
    monitor='loss',
    save_best_only=True,
    save_weights_only=True,
    verbose=1)

# define model
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=2, activation=LeakyReLU(), input_shape=(n_steps, n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(50, activation=LeakyReLU()))
model.add(Dense(n_features))
model.compile(optimizer=Nadam(lr=0.09), loss=SparseCategoricalCrossentropy(),
              metrics=['accuracy', mean_squared_error, mean_absolute_error, mean_absolute_percentage_error])

# fit model
model.fit(X, y, epochs=10, verbose=2, callbacks=[printlr, reduce_lr, model_checkpoint_callback])

split_sequences function like its name says it is splitting the database by taking just N rows from it as input and trying to predict the all N+1 row from the database as the output.

However, I think there is a problem because I am getting this error every time I am trying to run the python script:

tensorflow.python.framework.errors_impl.InvalidArgumentError:  logits and labels must have the same first dimension, got logits shape [32,20] and labels shape [640]
     [[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits
 (defined at C:\Users\UserName\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\backend.py:5114)
]] [Op:__inference_train_function_1228]

Any idea on how to fix this problem, please?

Thank you in advance!



Solution 1:[1]

Assuming the labels are integers, they have the wrong shape for SparseCategoricalCrossentropy. Check the docs. Try converting your y to one-hot encoded labels:

y = tf.keras.utils.to_categorical(y, num_classes=20)

and change your loss function to CategoricalCrossentropy:

model.compile(optimizer=Nadam(lr=0.09), loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy', mean_squared_error, mean_absolute_error, mean_absolute_percentage_error])

and it should work.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1