'training loss is Nan - but trainning data is all in range without null

When I execute my model.fit(x_train_lstm, y_train_lstm, epochs=3, shuffle=False, verbose=2)

I always get loss as nan:

Epoch 1/3
73/73 - 5s - loss: nan - accuracy: 0.5417 - 5s/epoch - 73ms/step
Epoch 2/3
73/73 - 5s - loss: nan - accuracy: 0.5417 - 5s/epoch - 74ms/step
Epoch 3/3
73/73 - 5s - loss: nan - accuracy: 0.5417 - 5s/epoch - 73ms/step

My x_training is shaped (2475, 48), y_train is shaped (2475,)

  • I derive my input train set in (2315, 160, 48), so 2315 sets of training data, 160 as my loopback timewindow, 48 features
  • corresspondingly, the y_train is 0 or 1 in shape of (2315, 1)

All in range of (-1,1): Overview of training data

My model is like this:

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_6 (LSTM)               (None, 160, 128)          90624     
                                                                 
 dropout_4 (Dropout)         (None, 160, 128)          0         
                                                                 
 lstm_7 (LSTM)               (None, 160, 64)           49408     
                                                                 
 dropout_5 (Dropout)         (None, 160, 64)           0         
                                                                 
 lstm_8 (LSTM)               (None, 32)                12416     
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 152,481
Trainable params: 152,481
Non-trainable params: 0
  • I tried different LSTM units: 48, 60, 128, 160, none of them work
  • I check my training data, all of them are in the range of (-1,1)
  • There is no 'null' in my dataset, x_train.isnull().values.any() outputs False

Now I have no clue where can I try more~

My model code is:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.layers import Dropout

def create_model(win = 100, features = 9):
    model = Sequential()
    model.add(LSTM(units=128, activation='relu', input_shape=(win, features),
        return_sequences=True))
    model.add(Dropout(0.1))
    model.add(LSTM(units=64, activation='relu', return_sequences=True))
    model.add(Dropout(0.2))

    # no need return sequences from 'the last layer'
    model.add(LSTM(units=32))

    # adding the output layer
    model.add(Dense(units=1, activation='sigmoid'))

    # may also try mean_squared_error
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

here I plot some train_y samples: y_train is 0 or 1 labels



Solution 1:[1]


2022.Mar.17 update

After further debugging, I eventually found out the problem is actually because of my newly added feature contains np.inf, after I remove those rows, my problem resolved, I can see the loss value now

6/6 [==============================] - 2s 50ms/step - loss: 0.6936 - accuracy: 0.5176

Note, np.inf has symbol, so make sure both np.inf and -np.inf are removed:

all_dataset = all_dataset[all_dataset.feature_issue != np.inf]
all_dataset = all_dataset[all_dataset.feature_issue != -np.inf]

2022.Mar.16

After some debugging, I addressed 2 of my new added features actually cause the problem. So, the problem comes from data, but unlike the others, my data contains no nan, no out of range(originally I thought all the data needs to be normalized)

But I can't tell the reason yet, as they looks good

I will continue research more on it tomorrow, any advice is appreciated!

enter image description here

Solution 2:[2]

Two things: try normalizing your time series data and using relu as the activation function for lstm layers is not 'conventional'. Check this post for further insights. An example:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.layers import Dropout
import tensorflow as tf

layer = tf.keras.layers.Normalization(axis=-1)
x = tf.random.normal((500, 100, 9))
y = tf.random.uniform((500, ), dtype=tf.int32, maxval=2)
layer.adapt(x)

def create_model(win = 100, features = 9):
    model = Sequential()
    model.add(layer)
    model.add(LSTM(units=128, activation='tanh', input_shape=(win, features),
        return_sequences=True))
    model.add(Dropout(0.1))
    model.add(LSTM(units=64, activation='tanh', return_sequences=True))
    model.add(Dropout(0.2))

    model.add(LSTM(units=32))

    model.add(Dense(units=1, activation='sigmoid'))

    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = create_model()

model.fit(x, y, epochs=20)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2