'Tensorflow tf.GradientTape() should only use Tf.variables?

I'm trying to write a reinforcement learning agent using tensorflow. I'm wondering if the states should be tf.Variables or can be numpy arrays for backpropogation using gradient tape. I'm not sure if the gradients will be correct if my states/action arrays are numpy instead of tensorflow arrays, I do know that the loss function returns a tf.Variable however. Thanks, I'm still a beginner to using Tensorflow any explanation/suggestions would help alot.

In a very simplified form (not word for word), my code looks something like:

with tf.GradientTape as tape:
   
   #actions/states are both lists of np arrays
   action = model.call(state)
   states.append(state)
   actions.append(actions) 

   loss = model.loss(states,actions) #loss returns tf.variable

model.optimizer.apply_gradients(tape.gradient(loss, model.variables) 


Solution 1:[1]

Hi Noob :) The optimizer.apply_gradients operation will update only model tf.Variables having non-zero gradients (see input argument model.variables).

Reference: https://www.tensorflow.org/api_docs/python/tf/GradientTape

Trainable variables (created by tf.Variable or tf.compat.v1.get_variable, where trainable=True is default in both cases) are automatically watched. Tensors can be manually watched by invoking the watch method on this context manager.


Edit: if you want to call the model to make a predictions given a numpy array: this is sort of possible. According to the documentation the input of model.call() should be a tensor object. You can simply get a tensor from your numpy array as:

state  # numpy array
tf_state = tf.constant(state)
model.call(tf_state)

Of course, instead of creating new tf.constants for each iteration of the training loop, you can first initialize a (non-trainable) tf.Variables, and then just update its values with those of the numpy array! Something like the following should work:

tf_state = tf.Variable(np.zeros_like(state), dtype=tf.float32, trainable=False)
for iter in n_train_iterations:
    state = get_new_numpy_state()
    tf_state.assign(state)
    model.call(tf_state)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1