'TF.Keras SparseCategoricalCrossEntropy return nan on GPU

Tried to train UNet on GPU to create binary classified image. Got nan loss on each epoch. Testing of loss function always produces nan-return.

Test case:

import tensorflow as tf
import tensorflow.keras.losses as ls

true = [0.0, 1.0]
pred = [[0.1,0.9],[0.0,1.0]]

tt = tf.convert_to_tensor(true)
tp = tf.convert_to_tensor(pred)

l = ls.SparseCategoricalCrossentropy(from_logits = True)
ret = l(tt,tp)

print(ret) #tf.Tensor(nan, shape=(), dtype=float32)

If i would force my tf to work with CPU (Can Keras with Tensorflow backend be forced to use CPU or GPU at will?), all works fine. And yes, my UNet fits and predicts correctly on CPU.

I checked several posts on keras GitHub, but the all point to problems with compiled ANN, such as using inappropriate optimizers for categorical crossentropy.

Any workaround? Am i missing something?



Solution 1:[1]

I had the same issue. My loss was a real number if I trained on CPU. I tried upgrading the TF version, but it didn't fix the problem. I finally fixed my issue by reducing the y dimension. My model output was a 2D array. When I reduced it to 1D, I managed to get a real loss on GPU.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dani