'Why is making shuffle=False on validation set giving better results in confusion matrix and classification report than shuffle=True?
If I'm giving shuffle=False while creating a test or validation dataset,
test_dataset = test_image_gen.flow_from_directory(test_path,
target_size=(125,125),
batch_size=batch_size,
class_mode='binary',
shuffle=False)
while making predictions using predict_generator, I'm getting better confusion matrix and classification report when shuffle is False.
[[947 53]
[ 25 975]]
precision recall f1-score support
0 0.97 0.95 0.96 1000
1 0.95 0.97 0.96 1000
accuracy 0.96 2000
macro avg 0.96 0.96 0.96 2000
weighted avg 0.96 0.96 0.96 2000
But if I set shuffle=True the results are very disheartening.
test_dataset = test_image_gen.flow_from_directory(test_path,
target_size=(125,125),
batch_size=batch_size,
class_mode='binary',
shuffle=True)
[[495 505]
[477 523]]
precision recall f1-score support
0 0.51 0.49 0.50 1000
1 0.51 0.52 0.52 1000
accuracy 0.51 2000
macro avg 0.51 0.51 0.51 2000
weighted avg 0.51 0.51 0.51 2000
Solution 1:[1]
In your case, the problem with setting the shuffle=True is that if you shuffle on your validation set, the results will be chaotic. It happens that the prediction is correct but compared to wrong indices can lead to misleading results, just like it happened in your case.
Always shuffle=True on the training set and shuffle=False on the validation set and test set.
Original answer : accuracy-reduced-when-shuffle-set-to-true-in-keras-fit-generator
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Kartik Sikka |
