'Confusion Between Tensorflow Operations and Python Operations

I've checked some related post Tensorflow vs Numpy math functions, but still have some confusion: when shall we use tensorflow library function, and when shall we use python library function? Both are often mixed in the code as shown below.

For example, why shall we use tf.math.pow instead of python library math.pow? Both variables warmup_percent_done and self.power are scalars instead of tensorflow tensors.

tf.math.pow(warmup_percent_done, self.power)

Also, the global_step_float / warmup_steps_float actually overloads the tf.math.divide operations?

source code link: https://github.com/NVIDIA/DeepLearningExamples/blob/master/TensorFlow2/LanguageModeling/BERT/optimization.py#L52

  def __call__(self, step):
    with tf.name_scope(self.name or 'WarmUp') as name:
      # Implements polynomial warmup. i.e., if global_step < warmup_steps, the
      # learning rate will be `global_step/num_warmup_steps * init_lr`.
      global_step_float = tf.cast(step, tf.float32)
      warmup_steps_float = tf.cast(self.warmup_steps, tf.float32)
      warmup_percent_done = global_step_float / warmup_steps_float
      warmup_learning_rate = (
          self.initial_learning_rate *
          tf.math.pow(warmup_percent_done, self.power))
      return tf.cond(global_step_float < warmup_steps_float,
                     lambda: warmup_learning_rate,
                     lambda: self.decay_schedule_fn(step),
                     name=name)

Update:

So, warmup_learning_rate must be a tensorflow tensor object so that the __call__function can return tensor object. Two more questions:

  1. when calculating warmup_learning_rate, why don't we cast self.initial_learning_rate to tensorflow tensor objects?

  2. Why shall we cast this into tensorflow tensor objects?

global_step_float = tf.cast(step, tf.float32) warmup_steps_float = tf.cast(self.warmup_steps, tf.float32)



Solution 1:[1]

TensorFlow operations are on tensor objects, as numpy operations are on arrays.

Try to think about the difference between these implementations of the same function.

def sigmoid(x):
    return 1/(1+math.exp(-x))

def sigmoidnp(x):
    return 1/(1+np.exp(-x))

and try to call them on x = np.array([1, 2, 3, 4]). what do you expect?

The same holds for tf operations. They act on and return TensorFlow objects.

In the example, you are posting,

...
global_step_float = tf.cast(step, tf.float32)
warmup_steps_float = tf.cast(self.warmup_steps, tf.float32)
warmup_percent_done = global_step_float / warmup_steps_float
...

the variables global_step_float and warmup_steps_float are cast as tensor, hence, the / operation is actually equivalent to tf.math.divide in the exact same way as if you sum two lists

l1 = [1,2,3]
l2 = [4,5,6]

l = l1+l2

+ is equivalent to l1.__add__(l2)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dennis