'Creating transformer layers in a for loop on multi gpu setup causes errors

I created a type of transformer network in tensorflow. When running eagerly on one GPU the code works fine. But when trying to run the code on multiple gpus using the @tf.function decorator and tf.distribute.MirroredStrategy() I recieve tons of errors.

I believe the problem is with the list of ComparatorLayers and for loop in the call method.

I tried changing the for loop to

for i in tf.range(self.num_layers):
    x, att_weights = self.comp_layers[i](x, enc_output, training, padding_mask)

Even tried using the:

for i in tf.range(self.num_layers):
    x = tf.gather(self.comp_layers, i)(x, enc_output, training, padding_mask)

But I just don't understand tensorflow well enough. I know tensorflow is trying to create a graph and it just doesn't work with these python side effects..can anyone please help with the correct way of doing this ?

The original code:

class Comparator(tf.keras.layers.Layer):
        def __init__(self, num_layers, d_model, num_heads, dff, maximum_position_encoding, rate=0.1):
            super(Comparator, self).__init__()
            
            self.d_model = d_model
            self.num_layers = num_layers
            
            self.pos_encoding = positional_encoding(maximum_position_encoding, self.d_model)
            
            self.comp_layers = [ComparatorLayer(self.d_model, num_heads, dff, rate) for _ in range(num_layers)]
            self.dropout = tf.keras.layers.Dropout(rate)
            
        def call(self, x, enc_output, training, padding_mask):
            
            seq_len = tf.shape(x)[1]
            #attention_weights = {}
            
            x += self.pos_encoding[:, :seq_len, :]
            x = self.dropout(x, training=training)
            
            for i in range(self.num_layers):
                x, att_weights = self.comp_layers[i](x, enc_output, training, padding_mask)
                attention_weights[f'comparator_layer{i+1}'] = att_weights
            
            return x, attention_weights

How to create the number of layers based on parameter 'num_layers'?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source