'Calling a model inside a GradientTape scope VS calling it in a loss function

Is there a difference in the gradient computation between the two below code snippets ...

Code 1:

with tf.GradientTape() as tape:

     output_A = model_A(input)
     output_B = model_B(input)

     loss = loss_fn(output_A, output_B, true_output_A, true_output_B)

grads = tape.gradient(loss, model_A.trainable_variables)
optimizer.apply_gradients(zip(grads, model_A.trainable_variables))

# ------------------------------------------------------------------------
MSE = tf.keras.losses.MeanSquaredError()

def loss_fn(output_A, output_B, true_output_A, true_output_B)

     loss = MSE(output_A, true_output_A) + MSE(output_B, true_output_B)

     return loss

Code 2:

with tf.GradientTape() as tape:

     output_A = model_A(input)

     loss = loss_fn(output_A, model_B, input, true_output_A, true_output_B)

grads = tape.gradient(loss, model_A.trainable_variables)
optimizer.apply_gradients(zip(grads, model_A.trainable_variables))

# ------------------------------------------------------------------------
MSE = tf.keras.losses.MeanSquaredError()

def loss_fn(output_A, model_B, input, true_output_A, true_output_B)

     output_B = model_B(input)
     
     loss = MSE(output_A, true_output_A) + MSE(output_B, true_output_B)

     return loss

output_A and output_B are related using a mathematical equation .. I would like model_A to learn how to generate output_A using the way model_B generates output_B ..

I hope that makes sense ...



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source