'Stratified train-test splitting a Tensorflow dataset
I am currently working with a quite large image-dataset and I loaded it using ImageDataGenerator from tensorflow.keras in python. As the classification of my data is very imbalanced I wanted to do a stratified train-test-split to possibly achieve a higher accuracy.
I know how to do a simple random train-test-split using ImageDataGenerator but I couldn't find any equivalent of the stratified train_test_split you can do in sklearn.
Is there any way to stratified train-test-split a tensorflow.data.Dataset?
And if not, how do you deal with large imbalanced datasets?
I would very appreciate your help!
Here is the relevant code:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator()
dataset = datagen.flow_from_directory(
path_images,
target_size=(ImageHeight, ImageWidth),
color_mode='rgb',
class_mode='sparse',
batch_size=BatchSize,
shuffle=True,
seed=Seed,
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
