'tf.IndexedSlicesValue when returned from tf.gradients()

I'm having the following problem, I have four embedding matrices and want to get the gradients of my loss function with respect to those matrices.

When I run the session to return the values for the gradients, two of those returned objects are of type tensorflow.python.framework.ops.IndexedSlicesValue, the other two are numpy arrays. Now for the numpy arrays, their shape corresponds to the shape of their corresponding embedding matrix, but I'm having problems with the IndexedSlicesValue objects.

If I call .values on one of those objects, I get an array whose shape does not match that of the gradient, the shape of the embedding matrix is [22,30], but calling .values on the IndexedSlicesValue object I get an array with shape [4200,30] ( The shape of my input tensor had dimensions of [30,20,7], the product of those dimensions equals 4200, not sure if this is relevant). The IndexedSlicesValue object has an attribute called dense_shape, which is an array that holds the dimensions the gradient should have, i.e. array([22,30]) is value returned by .dense_shape.

I don't really understand the docs here: https://www.tensorflow.org/versions/r0.7/api_docs/python/state_ops.html#IndexedSlices

It says:

An IndexedSlices is typically used to represent a subset of a larger tensor dense of shape [LARGE0, D1, .. , DN] where LARGE0 >> D0. The values in indices are the indices in the first dimension of the slices that have been extracted from the larger tensor.

So this array of shape (4200,30) is extracted from an array corresponding to an even larger, dense tensor?

What exactly is the gradient in this IndexedSlicesValue object and why does tensorflow automatically use this type for some gradients returned by tf.gradients()?

Here is my code:

input_tensor = tf.placeholder(tf.int32, shape = [None, memory_size, max_sent_length], name = 'Input')
q_tensor = tf.placeholder(tf.int32, shape = [None,max_sent_length], name = 'Question')
a_tensor = tf.placeholder(tf.float32, shape = [None,V+1], name = 'Answer')
# Embedding matrices
A_prior = tf.get_variable(name = 'A', shape = [V+1,d], initializer = tf.random_normal_initializer(stddev = 0.1))
A = tf.concat(0,[tf.zeros(shape = tf.pack([1,tf.shape(A_prior)[1]])),tf.slice(A_prior,[1,0],[-1,-1])])
B = tf.get_variable(name = 'B', shape = [V+1,d], initializer = tf.random_normal_initializer(stddev = 0.1))
C = tf.get_variable(name = 'C', shape = [V+1,d], initializer = tf.random_normal_initializer(stddev = 0.1))
W = tf.get_variable(name = 'W', shape = [V+1,d], initializer= tf.random_normal_initializer(stddev = 0.1))
embeddings = tf.reduce_sum(tf.nn.embedding_lookup(A,input_tensor),2)
u = tf.reshape(tf.reduce_sum(tf.nn.embedding_lookup(B,q_tensor),1),[-1,1,d])
test = tf.transpose(embeddings, perm = [0,2,1])
test_batch_mul = tf.squeeze(tf.batch_matmul(u,test))
cond = tf.not_equal(test_batch_mul,0.0)
tt = tf.fill(tf.shape(test_batch_mul),-1000.0)
softmax_in = tf.select(cond, test_batch_mul, tt)
p_values = tf.nn.softmax(softmax_in)
c_values = tf.reduce_sum(tf.nn.embedding_lookup(C,input_tensor),2)
o = tf.squeeze(tf.batch_matmul(tf.expand_dims(p_values,1),c_values))
a_pred = tf.nn.softmax(tf.matmul(tf.squeeze(u)+o,tf.transpose(W)))
loss = tf.nn.softmax_cross_entropy_with_logits(a_pred, a_tensor, name = 'loss')
cost = tf.reduce_mean(loss)
global_step = tf.Variable(0,name = 'global_step', trainable= False)
#optimizer = tf.train.MomentumOptimizer(0.01,0.9)
vars_list = tf.trainable_variables()
grads = tf.gradients(cost, vars_list)
#train_op = optimizer.minimize( cost, global_step, vars_list, name = 'train_op')

sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
input_feed = {input_tensor : phrases, q_tensor : questions, a_tensor : answers}
grad_results = sess.run(grads, feed_dict = input_feed)


Solution 1:[1]

I had the same issue, apparently IndexedSlices objects are automatically created for some Embedding matrices when computing their gradients,

If you want to access the gradients of the trainable variables of the Embedding, you need to convert the IndexedSlices to a tensor, by simply using:

tf.convert_to_tensor(gradients_of_the_embedding_layer)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dharman