'GRU (gated recurrent unit) not working on GPU (TensorFlow)

I am trying to train RNN models on my GPU (NVIDIA RTX3080) using TensorFlow, however GRU cells are not working properly.

When training LSTM models, it works fine and it takes only few seconds.

Example

act = "tanh"
recurrent_act = "sigmoid"

lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(32, return_sequences=True, activation=act, recurrent_activation=recurrent_act),
    tf.keras.layers.LSTM(32, return_sequences=True, activation=act, recurrent_activation=recurrent_act),
    tf.keras.layers.Dense(units=8),
    tf.keras.layers.Dense(units=1)
])

history = compile_and_fit(lstm_model, wide_window_d) 

# Epoch 1/120
# 368/368 [==============================] - 12s 17ms/step - loss: 0.2664 - mean_absolute_error: 0.2883 - val_loss: 0.0273 - val_mean_absolute_error: 0.0845
# Epoch 2/120
# 368/368 [==============================] - 5s 14ms/step - loss: 0.0067 - mean_absolute_error: 0.0381 - val_loss: 0.0063 - val_mean_absolute_error: 0.0435

However, when I use GRU cells, training takes 10x more time.

gru_model = tf.keras.models.Sequential([
    
    tf.keras.layers.GRU(32, return_sequences=True, activation=act, recurrent_activation=recurrent_act),
    tf.keras.layers.GRU(32, return_sequences=True, activation=act, recurrent_activation=recurrent_act),
    tf.keras.layers.Dense(units=1)
])

history = compile_and_fit(gru_model, wide_window_d)

# Epoch 1/120
# 368/368 [==============================] - 49s 129ms/step - loss: 0.1086 - mean_absolute_error: 0.1560 - val_loss: 0.0101 - val_mean_absolute_error: 0.0498
# Epoch 2/120
# 368/368 [==============================] - 48s 130ms/step - loss: 0.0018 - # mean_absolute_error: 0.0210 - val_loss: 0.0038 - val_mean_absolute_error: 0.0320

But the biggest problem is when I use Bidirectional GRU, because training time increases until it gets stuck and I have to restart kernel.

gru_model_bidirectional = tf.keras.models.Sequential([
    
    tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32, return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32, return_sequences=True)),
    tf.keras.layers.Dense(units=1)
])

history = compile_and_fit(gru_model_bidirectional, wide_window_d)

# Epoch 1/120
#  33/368 [=>............................] - ETA: 1:19 - loss: 0.5314 - mean_absolute_error: 0.4526
!! It always gets stuck at this point. Few seconds after start and I have to restart kernel.!!

My specs and versions right now

I am using anaconda.

Tensorflow: 2.4.1
cudatoolkit: 11.2.1
cudnn: 8.1.0.77
Python: 3.8

I have tried so far

I have tried to install various versions of tensorflow (even tf-nightly) and also other versions on cuda and cudnn, but I get stuck on Bidirectional GRU everytime.

I have also red that there might be some problem with GPUs memory that is why I added this to my code (after tensorflow import)

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)

The problem with some older versions was that they do not support RTX30xx series cards.

Note

On cpu it also works fine (all those models above), however with larger models training takes too long on CPU.

So, if anyone knows why LSTM cells work totally fine (even bidirectional) and GRU cells are problematic please let me know. Thank you very much.

EDIT 1

Whole code

I have this class to work with models

class MyModel():
    
    def __init__(self, model):
        self.model = model
        
    def load_model(self, dir_name):
        self.model = load_model(dir_name)
    
    def eval_mod(self, window, verbose):
        res = self.model.evaluate(window, verbose=verbose)
        print("Loss:", res[0], "MAE:", res[1])
        
    def save_model(self, name):
        self.model.save("models\\models_05_04_2021\\"+ name +".model")
        
    def retrain_model(self, window, patience, epochs):
        early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience, mode='min', restore_best_weights=True)
        history = self.model.fit(window.train, epochs=epochs, validation_data=window.val, callbacks=[early_stopping])
        self.history = history
        
    def compile_and_fit(self, window, patience=3, epochs=120):

        ## callbacks list https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
        early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                        patience=patience,
                                                        mode='min', restore_best_weights=True)
        
        # https://github.com/Jaewan-Yun/optimizer-visualization
        # https://www.tensorflow.org/api_docs/python/tf/keras/Model
        opt = tf.optimizers.Adam()
        self.model.compile(loss=tf.losses.MeanSquaredError(),
                    optimizer=opt,
                    metrics=[tf.metrics.MeanAbsoluteError()])

        history = self.model.fit(window.train, epochs=epochs,
                          validation_data=window.val,
                          callbacks=[early_stopping])
        return history

Then training

act = "tanh"
recurrent_act = "sigmoid"

lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(512, return_sequences=True, activation=act, recurrent_activation=recurrent_act),
    tf.keras.layers.Dense(32, activation=act),
    tf.keras.layers.Dense(units=1)
])

lstm = MyModel(lstm_model)

history = lstm.compile_and_fit(wide_window_d)

Window is created using WindowGenerator class from https://www.tensorflow.org/tutorials/structured_data/time_series

wide_window_my = WindowGenerator(
    input_width=24, label_width=24, shift=FUTURE_PERIOD_PREDICT,train_df = train_df_my,val_df = val_df_my, test_df = test_df_my,label_columns=['to_predict'])

Solution 1:^[1]

Updating Tensorflow to version > 2.5.0 solved the issue.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1