'How to implement an RNN from scratch in Tensorflow?

I'm trying to write a simple RNN layer from the ground up. This is for educational purposes only. I know Tensorflow has keras.layers.SimpleRNN, LSTM and GRU that are pretty easy to use. The point of this exercise is to learn to write custom experimental networks.

The math used for hidden state and output is follows. Where g1 and g2 can be tanh for this purpose.

enter image description here

This is what I have come up with.

class BasicRNN(keras.layers.Layer):
    def __init__(self, num_units=32):
        super(BasicRNN, self).__init__()
        self.num_units = num_units

    def build(self, input_shape):
        
        batch_size, sequence_length, num_features = input_shape
        
        self.W_hh = self.add_weight(
            shape=(self.num_units, self.num_units),
            initializer="random_normal",
            trainable=True,
        )
        self.W_hx = self.add_weight(
            shape=(self.num_units, num_features),
            initializer="random_normal",
            trainable=True,
        )
        self.b_h = self.add_weight(
            shape=(self.num_units, 1), initializer="random_normal", trainable=True
        )
        self.W_yh = self.add_weight(
            shape=(self.num_units, self.num_units),
            initializer="random_normal",
            trainable=True,
        )
        self.b_y = self.add_weight(
            shape=(self.num_units, 1), initializer="random_normal", trainable=True
        )

    def call(self, input_batch):
        output = []
        
        for sequence in input_batch:
            
            h_t = None
            y_t = None
            h_last = tf.Variable(tf.zeros([self.num_units, 1]))

            for timestep in sequence:
                # print("Timestep:", timestep.get_shape())
                timestep = tf.reshape(timestep, (timestep.get_shape()[0], 1))
                
                h_t = tf.math.tanh(
                    tf.matmul(self.W_hh, h_last) + 
                    tf.matmul(self.W_hx, timestep) + self.b_h)
                y_t = tf.math.tanh(
                    tf.matmul(self.W_yh, h_t) + self.b_y)

                #This is how RNN saves state when traversing a sequence.
                h_last = h_t

            #Output the final y_t
            output.append(y_t)
            
        return np.array(output)

I am testing the layer as follows.

X = np.array([
    [[1.0], [2.0], [3.0]],
    [[4.0], [5.0], [6.0]],
    [[7.0], [8.0], [9.0]],
    [[10.0], [11.0], [12.0]],
    [[13.0], [14.0], [15.0]]
])

Y = np.array([6.0, 15.0, 24.0, 33.0, 42.0])    

def create_rnn_addition_model():
    model = keras.Sequential()
    model.add(BasicRNN(50))
    model.add(keras.layers.Dense(1)) #Readout layer
    model.compile(optimizer='adam', loss='mean_squared_error')
    
    return model

def train_rnn_addition_model(model):
    model.fit(X, Y, epochs=200, batch_size=2, verbose=False)

def test_rnn_addition_model(model, numbers):
    prediction = model.predict(numbers)
    
    print(prediction)

model = create_rnn_addition_model()

train_rnn_addition_model(model)

I get this error.

TypeError: Expected binary or unicode string, 
got <tf.Tensor 'sequential_25/basic_rnn_25/while/Identity_1:0' 
shape=(50, 1) dtype=float32>

I have spent a long time searching on the web for such an implementation. No luck. I looked at the source code of keras.layers.SimpleRNN. It inherits from the RNN base class which is quite complex. In any case, if someone can help me straighten out the code above I will very much appreciate it. Thank you.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source