'Training validation and accuracy are very high, but testing accuracy is low

The training accuracy and validation accuracy of my model is extremely high.

...
Epoch 4/5
457/457 [==============================] - 4s 8ms/step - loss: 0.0237 - accuracy: 0.9925 - val_loss: 0.0036 - val_accuracy: 0.9993
Epoch 5/5
457/457 [==============================] - 4s 8ms/step - loss: 0.0166 - accuracy: 0.9941 - val_loss: 0.0028 - val_accuracy: 0.9994

However, upon testing, the accuracy is atrocious:

(for high accuracy there would be a green diagonal from top-left to bottom-right)

I am not sure why this is, given the high accuracy and low loss of both the training and validation set. If the model was overfitting, then either the validation loss or accuracy should be deviating from the training loss or accuracy, but it is not. Here are my data generators:

train_datagen = DataGenerator(
    partition["train"], 
    labels, 
    batch_size=BATCH_SIZE,
    **params
)
val_datagen = DataGenerator(
    partition["val"],
    labels,
    batch_size=BATCH_SIZE,
    **params
)
test_datagen = DataGenerator(
    partition["test"],
    labels,
    batch_size=1,
    **params
)

Note that since my data takes the form of a npy array on a .npy file, I followed this post to create a custom datagenerator class.

Here is my training process:

history = model.fit(
    train_datagen,
    epochs = 5,
    steps_per_epoch = len(train_datagen),
    validation_data = val_datagen,
    validation_steps = len(val_datagen),
    shuffle = False,
    callbacks = callback,
    use_multiprocessing = True,
    workers = 4
)

Here you can see how I partitioned my data:

print(len(partition["train"]))
print(len(partition["val"]))
print(len(partition["test"]))
print(len(partition["train"]) + len(partition["val"]) + len(partition["test"]))
print(good, ok, bad)
# good: 0, ok: 1, bad: 2

29249
8342
4144
41735
18152 12665 10918

I also confirmed that there is no overlap between any of the sets:

print(bool(set(partition["train"]) & set(partition["val"])))
print(bool(set(partition["test"]) & set(partition["val"])))
print(bool(set(partition["train"]) & set(partition["test"])))

False
False
False

Can someone please help me figure out where I went wrong? I'm not sure how it's possible to get such high testing and validation accuracy, but have a terrible testing rate. i have hosted my full code on and files on Github.

Solution 1:^[1]

OK after a few hours of debugging I have found the issue.

My neural net successfully reached >99% accuracy on the testing dataset, as was reflected in the training and validation sets. Only 1 wrong prediction was made out of 4144.

The problem was that I had shuffle turned on for the training dataset, so when comparing to the non-shuffled list of correct classes that was generated at the start, the results were completely random.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Jeff Chen

'Training validation and accuracy are very high, but testing accuracy is low

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]