'Is there a simple way to know which Tensorflow ops have a registered GPU kernel?

I have been trying to optimize some Tensorflow code that was pretty memory inefficient (use of large dense tensors containing very sparse information), and would thus limit batch size and scalability, by trying to make use of SparseTensors. After some struggle I finally come up with a decent solution with satisfactory speedup on CPU and very low memory usage, and when the time comes to use a GPU I realize that the previous memory inefficient is orders of magnitude faster...

Using tensorboard profiling I've discovered that two of the operations I have used in my ""optimized"" version only run on CPU (namely UniqueV2 and sparse_dense_matmul), but I could not see any hint of that in the documentation.

The only related piece of documentation states:

If a TensorFlow operation has no corresponding GPU implementation, then the operation falls back to the CPU device. For example, since tf.cast only has a CPU kernel, on a system with devices CPU:0 and GPU:0, the CPU:0 device is selected to run tf.cast, even if requested to run on the GPU:0 device.

In turn there is nothing in the tf.cast documentation hinting that the op has no GPU kernel.

Thus, is there a simple way to know whether a TF ops has a registered GPU kernel, without having to use a GPU to find it out?

The custom ops guide suggest that this could be seen by looking at the ops C files, but this seems a rather cumbersome way to do it...

I'm using TF v2.8

Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source