'Calling a model inside a GradientTape scope VS calling it in a loss function
Is there a difference in the gradient computation between the two below code snippets ...
Code 1:
with tf.GradientTape() as tape:
output_A = model_A(input)
output_B = model_B(input)
loss = loss_fn(output_A, output_B, true_output_A, true_output_B)
grads = tape.gradient(loss, model_A.trainable_variables)
optimizer.apply_gradients(zip(grads, model_A.trainable_variables))
# ------------------------------------------------------------------------
MSE = tf.keras.losses.MeanSquaredError()
def loss_fn(output_A, output_B, true_output_A, true_output_B)
loss = MSE(output_A, true_output_A) + MSE(output_B, true_output_B)
return loss
Code 2:
with tf.GradientTape() as tape:
output_A = model_A(input)
loss = loss_fn(output_A, model_B, input, true_output_A, true_output_B)
grads = tape.gradient(loss, model_A.trainable_variables)
optimizer.apply_gradients(zip(grads, model_A.trainable_variables))
# ------------------------------------------------------------------------
MSE = tf.keras.losses.MeanSquaredError()
def loss_fn(output_A, model_B, input, true_output_A, true_output_B)
output_B = model_B(input)
loss = MSE(output_A, true_output_A) + MSE(output_B, true_output_B)
return loss
output_A and output_B are related using a mathematical equation .. I would like model_A to learn how to generate output_A using the way model_B generates output_B ..
I hope that makes sense ...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
