'Keras model - high accuracy on natively executed code, fails to learn in colab

I am doing classification of citrus leaves dataset. I came up with a very basic model and ran it in Jupyter notebook on my machine, using anaconda. Exact same model from the notebook, when uploaded to Google colab, fails to learn. Is there any reason why this may be happening?

Here is the model architecture:

IN = Input(shape=(256, 256, 3))
x = Conv2D(filters=16, kernel_size=(2,2), padding='valid', activation='relu')(IN)
x = MaxPooling2D(pool_size=(2,2), padding='valid')(x)
x = Flatten()(x)
x = Dense(32, activation='relu')(x)
OUT = Dense(4, activation='softmax')(x)

model = Model(IN, OUT)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Training info from local jupyter notebook:

Epoch 1/5
25/25 [==============================] - 5s 189ms/step - loss: 4.2004 - accuracy: 0.5061 - val_loss: 1.2161 - val_accuracy: 0.4130
Epoch 2/5
25/25 [==============================] - 5s 185ms/step - loss: 0.7212 - accuracy: 0.7183 - val_loss: 0.3342 - val_accuracy: 0.9076
Epoch 3/5
25/25 [==============================] - 5s 187ms/step - loss: 0.2895 - accuracy: 0.9148 - val_loss: 0.2535 - val_accuracy: 0.9185
Epoch 4/5
25/25 [==============================] - 5s 209ms/step - loss: 0.1987 - accuracy: 0.9530 - val_loss: 0.1470 - val_accuracy: 0.9565
Epoch 5/5
25/25 [==============================] - 5s 194ms/step - loss: 0.1189 - accuracy: 0.9739 - val_loss: 0.1015 - val_accuracy: 0.9728

Training data on google colab notebook (exact same file was uploaded and ran):

Epoch 1/5
25/25 [==============================] - 14s 521ms/step - loss: 7.1605 - accuracy: 0.2052 - val_loss: 1.3863 - val_accuracy: 0.3207
Epoch 2/5
25/25 [==============================] - 13s 514ms/step - loss: 1.3845 - accuracy: 0.3461 - val_loss: 1.3827 - val_accuracy: 0.3207
Epoch 3/5
25/25 [==============================] - 13s 512ms/step - loss: 1.3803 - accuracy: 0.3461 - val_loss: 1.3789 - val_accuracy: 0.3207
Epoch 4/5
25/25 [==============================] - 13s 515ms/step - loss: 1.3761 - accuracy: 0.3461 - val_loss: 1.3751 - val_accuracy: 0.3207
Epoch 5/5
25/25 [==============================] - 13s 523ms/step - loss: 1.3720 - accuracy: 0.3461 - val_loss: 1.3716 - val_accuracy: 0.3207

For some reason the model on Google colab does not learn, it simply predicts the same class over and over again. I checked the data (train and test splits) and there are same number of images for both native jupyter and google colab. I also plotted images in colab to make sure that I am reading the data correctly.



Solution 1:[1]

There could be a version mismatch of all supported packages and libraries in Jupyter and in Google Colab.

However, you can add Rescaling layer in your model to scale the image values into 0 to 1 to get better results.

IN = Input(shape=(180, 180, 3))
x = Rescaling(1./255)(IN)
x = Conv2D(filters=16, kernel_size=(2,2), padding='valid', activation='relu')(x)
x = MaxPooling2D(pool_size=(2,2), padding='valid')(x)
x = Flatten()(x)
x = Dense(32, activation='relu')(x)
OUT = Dense(5, activation='softmax')(x)

model = Model(IN, OUT)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(train_ds,validation_data=val_ds,epochs=10)

Output: (I ran this code using the same dataset and TF 2.8 in Google Colab)

Epoch 1/10
16/16 [==============================] - 4s 136ms/step - loss: 5.3675 - accuracy: 0.3279 - val_loss: 3.0912 - val_accuracy: 0.5620
Epoch 2/10
16/16 [==============================] - 3s 123ms/step - loss: 1.7097 - accuracy: 0.4590 - val_loss: 0.9390 - val_accuracy: 0.5868
Epoch 3/10
16/16 [==============================] - 2s 84ms/step - loss: 0.7489 - accuracy: 0.6537 - val_loss: 0.7554 - val_accuracy: 0.6446
Epoch 4/10
16/16 [==============================] - 2s 84ms/step - loss: 0.5755 - accuracy: 0.7930 - val_loss: 0.6572 - val_accuracy: 0.7438
Epoch 5/10
16/16 [==============================] - 4s 146ms/step - loss: 0.4751 - accuracy: 0.8176 - val_loss: 0.4761 - val_accuracy: 0.8512
Epoch 6/10
16/16 [==============================] - 3s 81ms/step - loss: 0.3723 - accuracy: 0.8730 - val_loss: 0.4328 - val_accuracy: 0.8347
Epoch 7/10
16/16 [==============================] - 2s 81ms/step - loss: 0.2530 - accuracy: 0.9324 - val_loss: 0.4154 - val_accuracy: 0.8264
Epoch 8/10
16/16 [==============================] - 2s 83ms/step - loss: 0.1919 - accuracy: 0.9508 - val_loss: 0.4151 - val_accuracy: 0.8347
Epoch 9/10
16/16 [==============================] - 2s 82ms/step - loss: 0.2155 - accuracy: 0.9119 - val_loss: 0.6157 - val_accuracy: 0.7025
Epoch 10/10
16/16 [==============================] - 2s 83ms/step - loss: 0.2145 - accuracy: 0.9221 - val_loss: 0.3746 - val_accuracy: 0.8347
<keras.callbacks.History at 0x7f6cf0132210>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TFer2