'GradientTape returning None when run in a loop

The following gradient descent is failing 'coz the gradients returned by tape.gradient() are none when the loop runs second time.

w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = tf.constant([[1., 2., 3.]])


for i in range(10):
  print("iter {}".format(i))
  with tf.GradientTape() as tape:
    #forward prop
    y = x @ w + b  
    loss = tf.reduce_mean(y**2)
    print("loss is \n{}".format(loss))
    print("output- y is \n{}".format(y))
    #vars getting dropped after couple of iterations
    print(tape.watched_variables()) 
  
  #get the gradients to minimize the loss
  dl_dw, dl_db = tape.gradient(loss,[w,b]) 

  #descend the gradients
  w = w.assign_sub(0.001*dl_dw)
  b = b.assign_sub(0.001*dl_db)
iter 0
loss is 
23.328645706176758
output- y is 
[[ 6.8125362  -0.49663293]]
(<tf.Variable 'w:0' shape=(3, 2) dtype=float32, numpy=
array([[-1.3461215 ,  0.43708783],
       [ 1.5931423 ,  0.31951016],
       [ 1.6574576 , -0.52424705]], dtype=float32)>, <tf.Variable 'b:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>)
iter 1
loss is 
22.634033203125
output- y is 
[[ 6.7103477  -0.48918355]]
()

TypeError                                 Traceback (most recent call last)
c:\projects\pyspace\mltest\test.ipynb Cell 7' in <cell line: 1>()
     11 dl_dw, dl_db = tape.gradient(loss,[w,b]) 
     13 #descend the gradients
---> 14 w = w.assign_sub(0.001*dl_dw)
     15 b = b.assign_sub(0.001*dl_db)

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

I checked the documentation which explains the possibilities of the gradients becoming None, but none of them are helping.



Solution 1:[1]

This is because assign_sub returns a Tensor. In the line w = w.assign_sub(0.001*dl_dw) you are thus overwriting w with a tensor with the new value. Thus, in the next step, it is not a Variable anymore and is not tracked by the gradient tape by default. This results in the gradient becoming None (tensors also do not have the assign_sub method, so that would crash as well).

Instead, simply write w.assign_sub(0.001*dl_dw) and same for b. The assign functions work in place, so no assignment is necessary.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 xdurch0