'Creating transformer layers in a for loop on multi gpu setup causes errors
I created a type of transformer network in tensorflow. When running eagerly on one GPU the code works fine. But when trying to run the code on multiple gpus using the @tf.function decorator and tf.distribute.MirroredStrategy() I recieve tons of errors.
I believe the problem is with the list of ComparatorLayers and for loop in the call method.
I tried changing the for loop to
for i in tf.range(self.num_layers):
x, att_weights = self.comp_layers[i](x, enc_output, training, padding_mask)
Even tried using the:
for i in tf.range(self.num_layers):
x = tf.gather(self.comp_layers, i)(x, enc_output, training, padding_mask)
But I just don't understand tensorflow well enough. I know tensorflow is trying to create a graph and it just doesn't work with these python side effects..can anyone please help with the correct way of doing this ?
The original code:
class Comparator(tf.keras.layers.Layer):
def __init__(self, num_layers, d_model, num_heads, dff, maximum_position_encoding, rate=0.1):
super(Comparator, self).__init__()
self.d_model = d_model
self.num_layers = num_layers
self.pos_encoding = positional_encoding(maximum_position_encoding, self.d_model)
self.comp_layers = [ComparatorLayer(self.d_model, num_heads, dff, rate) for _ in range(num_layers)]
self.dropout = tf.keras.layers.Dropout(rate)
def call(self, x, enc_output, training, padding_mask):
seq_len = tf.shape(x)[1]
#attention_weights = {}
x += self.pos_encoding[:, :seq_len, :]
x = self.dropout(x, training=training)
for i in range(self.num_layers):
x, att_weights = self.comp_layers[i](x, enc_output, training, padding_mask)
attention_weights[f'comparator_layer{i+1}'] = att_weights
return x, attention_weights
How to create the number of layers based on parameter 'num_layers'?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
