'Tensorflow: Automatic Differentiation of Softmax Function
New to Tensorflow. I am trying to implement the gradient of the softmax function via gradient tape:
def soft_max_grad(y,x,c):
xy= tf.matmul(x,y)
e_nc = tf.ones([10,1])
s= tf.reduce_max(xy, axis=0) ; s=tf.reshape(s,[1,tf.shape(s)[0]])
n = 40000
S= xy-tf.matmul(e_nc,s)
e_n = tf.ones([n,1])
e_n = tf.constant(e_n)
e_nc = tf.constant(e_nc)
S = tf.Variable(S)
c = tf.Variable(c)
with tf.GradientTape() as tape:
E = (-1/n) * tf.matmul(e_nc.transpose(),tf.matmul(c*S,e_n)) + \
(1/n) * tf.matmul(np.log( tf.matmul(e_nc.transpose() , tf.exp(S)) ) , e_n)
grads = tape.gradient(E, [S,c])
return grads
Where:
y = tf.random.normal(shape = [785,40000])
c = tf.random.normal(shape = [10,40000])
x = tf.zeros([10, 785])
The output grads is the wrong size, it should be [10 785]. A function that I have already working for the gradient of the softmax is the following which gives a [10 785] output:
def softmax_grad1(y,x,c):
S = np.matmul(x,y) # Nc x N
n = np.shape(c)[1]
e_nc = np.ones((np.shape(c)[0],1))
interior = -c + np.exp(S) * (np.matmul(e_nc, 1/(np.matmul(e_nc.transpose(),np.exp(S)))))
grad = (1/n) * np.matmul(interior,y.transpose() )
return grad
Overall I am trying to implement the following scheme (which is encoded in softmax_grad1) via AutoDiff:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
