'Machine Leanring Test data

I am working on image classification. For CNN image classification, Can I use validation data as test data? or should I split data into three ( train, validation, test )?



Solution 1:[1]

Usually, you use validation data for model selection to find the best model and/or hyperparameters. Test data is used to estimate real world performance for the best model from the model selection step. You must not let any test data leak into the validation data and vice versa as you will risk overfitting.

Basically:

  1. When in training phase: train data
  2. Model selection phase: validation data
  3. Testing phase: then you can test your best model from the previous step on the test data to get a real world performance estimate.

All datasets should be nonoverlapping, and you should ideally not know about any properties of the test data.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1