'@tf_gradient peculiar implementation in StyleGan
I've been reading the source code for the StyleGAN implementation, and I cannot understand the peculiar use of the @tf_gradient decorator. Let us take the concrete example of their implementation of Leaky_Relu. The way I would do it is as follows :
def myLRelu(x,alpha=0.2):
alpha = tf.constant(alpha, dtype=x.dtype, name='alpha')
@tf.custom_gradient
def func(x):
y = tf.maximum(x, x * alpha)
def grad(dy):
dx = tf.where(y >= 0, dy, dy * alpha)
return dx
return y, grad
return func(x)
Which follows the tf documentation for the use of tf.custom_gradient. But in the styleGan paper, they implement it as follows (I removed the "variable_scope" in my implementation as I'm not sure what it does):
def leaky_relu(x, alpha=0.2):
with tf.variable_scope('LeakyReLU'):
alpha = tf.constant(alpha, dtype=x.dtype, name='alpha')
@tf.custom_gradient
def func(x):
y = tf.maximum(x, x * alpha)
@tf.custom_gradient
def grad(dy):
dx = tf.where(y >= 0, dy, dy * alpha)
return dx, lambda ddx: tf.where(y >= 0, ddx, ddx * alpha)
return y, grad
return func(x)
There are two @tf.custom_gradient decorators used, and I don't understand why since there clearly aren't any second order derivatives being computed (as they are identically 0 anyway for LRelu). Is this a trick to somehow speed up computations ? If so, how does it work ?
EDIT : To clarify why I think this is somehow a "trick" to make computations of gradients faster, the authors make the following comment in the code :
# High-level ops for manipulating 4D activation tensors.
# The gradients of these are meant to be as efficient as possible.
And for completeness, here is the repo from which I took the code from
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
