'Implementation difference between TensorFlow LSTMBlockFusedCell and PyTorch LSTM

I am attempting to translate a tensorflow LSTMBlockFusedCell model to pytorch LSTM, but I'm not getting the same outputs with identical input and weights in both models. I believe this is due to how the weights are being set for the torch model; in the code snippet beneath the TensorFlow weight has the shape (400, 164) whilst the PyTorch weights has the shape (400,64) and (400,100) for torch_lstm.weight_ih_l0 and torch_lstm.weight_hh_l0 respectively. I addressed this inconsistency by using the first 64 elements as weight_ih_l0 and the proceeding 100 elements as weight_hh_l0. According to this article, TensorFlow uses right-multiplication instead of PyTorch left-multiplication which is why I need to transpose the weight. Also I am setting the bias to 0 (rendering it useless) for debugging.

import tensorflow as tf
import numpy as np
import torch

time_len, batch_size, input_size, num_units = 50, 1, 64, 100 # L, N, Hin, Hout with torch semantics

# setup tensorflow LSTM
tf_lstm = tf.contrib.rnn.LSTMBlockFusedCell(num_units=num_units)
inp = tf.placeholder(tf.float32, shape=(time_len, batch_size, input_size))
out, c = tf_lstm(inp, dtype=tf.float32)
tf_weight = tf_lstm.weights[0]
tf_bias = tf_lstm.weights[1]

# initialize weights
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

# tf forward pass
a = np.random.randn(time_len, batch_size, input_size).astype(np.float32) # input
b = np.zeros(tf_bias.shape) # set lstm bias to zero
tf_out, lstm_weight, lstm_bias = sess.run([out, tf_weight, tf_bias], {inp: a, tf_bias: b})
assert (lstm_bias == 0).all() # make sure lstm bias was 0

# setup pytorch LSTM
torch_lstm = torch.nn.LSTM(input_size=input_size, hidden_size=num_units, num_layers=1, bias=False)

# set torch weights same as tensorflow weights (is this correct?)
w1 = lstm_weight[:input_size, :] # first 64 elements
w2 = lstm_weight[input_size:, :] # proceeding 100 elements
torch_lstm.weight_ih_l0.data = torch.tensor(w1.T) # transpose and set first weight
torch_lstm.weight_hh_l0.data = torch.tensor(w2.T) # transpose and set second weight

# torch forward pass
torch_out, (hn, cn) = torch_lstm(torch.tensor(a))
torch_out = torch_out.detach().numpy() # convert to numpy for compatibility

# compare
assert torch_out.shape == tf_out.shape
print("np.allclose(torch_out, tf_out) = ", np.allclose(torch_out, tf_out))
print("normalized difference: ", np.linalg.norm(torch_out - tf_out))

output:

np.allclose(torch_out, tf_out) = False
normalized difference: 10.741002

Expected output:

np.allclose(torch_out, tf_out) = True
normalized difference: ~0.0

I am running on cpu with the following dependencies:

numpy==1.21.5
tensorflow-gpu==1.14.0
torch==1.11.0

I am running tensorflow v1, cpu version should work, the python wheel is available here for python<=3.7.

Any help is appreciated.

Solution 1:^[1]

I believe I solved this by changing the order of weight associated with each gate and setting forget_bias=0.0 in LSTMBlockFusedCell:

import tensorflow as tf
import numpy as np
import torch
import itertools as it

time_len, batch_size, input_size, num_units = 50, 1, 64, 100 # L, N, Hin, Hout with torch semantics

# setup tensorflow LSTM
tf_lstm = tf.contrib.rnn.LSTMBlockFusedCell(num_units=num_units, forget_bias=0.0, dtype=tf.float32)
inp = tf.placeholder(tf.float32, shape=(time_len, batch_size, input_size))
out, c = tf_lstm(inp, dtype=tf.float32)
tf_weight = tf_lstm.weights[0]
tf_bias = tf_lstm.weights[1]

# initialize weights
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

# tf forward pass
a = np.random.randn(*inp.shape).astype(np.float32) # input
b = np.zeros(tf_bias.shape) # set lstm bias to zero
tf_out, lstm_weight, lstm_bias = sess.run([out, tf_weight, tf_bias], {inp: a, tf_bias: b})
assert (lstm_bias == 0).all() # make sure lstm bias was 0

# setup pytorch LSTM
torch_lstm = torch.nn.LSTM(input_size=input_size, hidden_size=num_units, num_layers=1, bias=False)

# weights associated with each gate
i = lstm_weight[:, 0:100].copy(), 'i'
f = lstm_weight[:, 100:200].copy(), 'f'
o = lstm_weight[:, 200:300].copy(), 'o'
g = lstm_weight[:, 300:400].copy(), 'g'

for i,f,o,g in it.permutations([i,f,o,g], 4):
    print(*[x[1] for x in (i,f,o,g)])
    i,f,o,g = (x[0] for x in (i,f,o,g))
    lstm_weight = np.concatenate([i,f,o,g], axis=1)

    # set torch weights same as tensorflow weights
    w1 = lstm_weight[:input_size, :] # first 64 elements

    w2 = lstm_weight[input_size:, :] # proceeding 100 elements
    torch_lstm.weight_ih_l0.data = torch.tensor(w1.T) # transpose and set first weight
    torch_lstm.weight_hh_l0.data = torch.tensor(w2.T) # transpose and set second weight

    # torch forward pass
    torch_out, (hn, cn) = torch_lstm(torch.tensor(a))
    torch_out = torch_out.detach().numpy() # convert to numpy for compatibility

    # compare
    assert torch_out.shape == tf_out.shape
    print("np.allclose(torch_out, tf_out) = ", np.allclose(torch_out, tf_out))
    print("normalized difference: ", np.linalg.norm(torch_out - tf_out))

This will print the difference for all permutations of gate weights, the combination i o f g gave a difference of 1.7814435e-06 which is close enough.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Kevin

'Implementation difference between TensorFlow LSTMBlockFusedCell and PyTorch LSTM

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]