'Tensorflow: Automatic Differentiation of Softmax Function

New to Tensorflow. I am trying to implement the gradient of the softmax function via gradient tape:

def soft_max_grad(y,x,c):
    xy= tf.matmul(x,y)
    e_nc = tf.ones([10,1])
    s= tf.reduce_max(xy, axis=0) ; s=tf.reshape(s,[1,tf.shape(s)[0]])
    n = 40000
    S= xy-tf.matmul(e_nc,s)
    e_n = tf.ones([n,1])

    e_n = tf.constant(e_n)
    e_nc = tf.constant(e_nc)
    S = tf.Variable(S)
    c = tf.Variable(c)

    with tf.GradientTape() as tape:
        E = (-1/n) * tf.matmul(e_nc.transpose(),tf.matmul(c*S,e_n)) + \
        (1/n) * tf.matmul(np.log( tf.matmul(e_nc.transpose() , tf.exp(S)) ) , e_n)

    grads = tape.gradient(E, [S,c])
    return grads

Where:

y = tf.random.normal(shape = [785,40000])
c = tf.random.normal(shape = [10,40000])
x = tf.zeros([10, 785])

The output grads is the wrong size, it should be [10 785]. A function that I have already working for the gradient of the softmax is the following which gives a [10 785] output:

def softmax_grad1(y,x,c):
    S = np.matmul(x,y) # Nc x N
    n = np.shape(c)[1]
    e_nc = np.ones((np.shape(c)[0],1))

    interior = -c + np.exp(S) * (np.matmul(e_nc, 1/(np.matmul(e_nc.transpose(),np.exp(S)))))
    grad = (1/n) * np.matmul(interior,y.transpose() )

    return grad

Overall I am trying to implement the following scheme (which is encoded in softmax_grad1) via AutoDiff:

Step 1 Step 2



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source