'Model val_accuracy higher than test accuracy without dropout regularization

I recently created a machine learning of 810 training and 810 test images (27 classes) in order to identify ASL hand signs. I trained this model using an SGD optimizer with a 0.001 learning rate, 5 epochs, and categorical cross entropy loss. However, my validation accuracy is around 20% higher than my model test accuracy, and I'm not sure why. I've tried adjusting my model structure, optimizers, learning rate, and epochs - this never changes. Anyone have any ideas? Here's my model code:

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
model = tf.keras.models.Sequential([
    
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    
    tf.keras.layers.Dense(27, activation='softmax')
])

model.summary()
from tensorflow.keras.optimizers import SGD
sgd = SGD(learning_rate=0.001, decay=1e-6, momentum=0.9, nesterov=True)

model.compile(loss = 'categorical_crossentropy',
              optimizer = sgd,
              metrics = ['accuracy'])

class myCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if(logs.get('accuracy')>0.95):
            print("\nReached >95% accuracy so cancelling training!")
            self.model.stop_training = True
        
callbacks = myCallback()
train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=40,
      width_shift_range=0.2, # Shifting image width by 20%
      height_shift_range=0.2,# Shifting image height by 20%
      shear_range=0.2,       # Shearing across X-axis by 20%
      zoom_range=0.2,        # Image zooming by 20%
      horizontal_flip=True,
      fill_mode='nearest')

train_generator = train_datagen.flow_from_directory(
    "/content/drive/MyDrive/train_asl",
    target_size = (150, 150),
    class_mode = 'categorical',
    batch_size = 5)
validation_datagen = ImageDataGenerator(rescale=1./255)

validation_generator = validation_datagen.flow_from_directory(
    "/content/drive/MyDrive/test_asl",
    target_size = (150, 150),
    class_mode = 'categorical',
    batch_size = 5
)
import numpy as np
history = model.fit_generator(
      train_generator,
      steps_per_epoch = np.ceil(810/5),  # 2520 images = batch_size * steps
      epochs = 100,
      validation_data=validation_generator,
      validation_steps = np.ceil(810/5),  # 372 images = batch_size * steps
      callbacks=[callbacks],
      verbose = 2)


Solution 1:[1]

Validation and test accuracies could be different if the underlying data distribution of validation and test data is different or unbalanced.

You could ensure that the class distribution of the 27 classes is approximately the same in both validation and test sets, using stratified sampling techniques. You could also check whether the input data distributions are same/different during validation and testing, because 810 images is not a lot especially if you are not using transfer-learning.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Aravind G.