'tensorflow using gpu on wsl2 is not learning

I am using Ubuntu 20.04 on wsl2 running on win11. The code to execute is as follows:

import tensorflow as tf
from tensorflow import keras
from keras.layers.convolutional import Conv2D, MaxPooling2D
import numpy as np

(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(2500, input_shape=(784,), activation='relu'),
    keras.layers.Dense(2000, activation='relu'),
    keras.layers.Dense(1500, activation='relu'),
    keras.layers.Dense(1000, activation='relu'),
    keras.layers.Dense(500, activation='relu'),
    keras.layers.Dense(10, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

print(model.summary())

model.fit(X_train, y_train, epochs=500)

If I run the code on CPU the output is as follows:

Epoch 205/500
1875/1875 [==============================] - 80s 43ms/step - loss: 0.1887 - accuracy: 0.9466
Epoch 206/500
1875/1875 [==============================] - 79s 42ms/step - loss: 0.3433 - accuracy: 0.9484
Epoch 207/500
1875/1875 [==============================] - 79s 42ms/step - loss: 0.1987 - accuracy: 0.9690
Epoch 208/500
1875/1875 [==============================] - 80s 43ms/step - loss: 0.2632 - accuracy: 0.9582

But if I run the same code over a docker(tensorflow/tensorflow:latest-gpu-py3-jupyter) the output is as follows:

Epoch 205/500
60000/60000 [==============================] - 45s 752us/sample - loss: 9.5371 - accuracy: 0.0987
Epoch 206/500
60000/60000 [==============================] - 45s 749us/sample - loss: 9.5371 - accuracy: 0.0987
Epoch 207/500
60000/60000 [==============================] - 45s 749us/sample - loss: 9.5371 - accuracy: 0.0987
Epoch 208/500
60000/60000 [==============================] - 45s 745us/sample - loss: 9.5371 - accuracy: 0.0987

The accuracy is constant.

The installation was made based on:

In the installation process I did not get any error.

Another thing is that in the compilation of the model on gpu the time used is huge(more than 5 minutes).

Thanks in advance for any help/idea.



Solution 1:[1]

You should decrease number of neurons, layers and use softmax in Dense(10), as you have multiclass output.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 razimbres