'Stratified train-test splitting a Tensorflow dataset

I am currently working with a quite large image-dataset and I loaded it using ImageDataGenerator from tensorflow.keras in python. As the classification of my data is very imbalanced I wanted to do a stratified train-test-split to possibly achieve a higher accuracy.

I know how to do a simple random train-test-split using ImageDataGenerator but I couldn't find any equivalent of the stratified train_test_split you can do in sklearn.

Is there any way to stratified train-test-split a tensorflow.data.Dataset? And if not, how do you deal with large imbalanced datasets? I would very appreciate your help!

Here is the relevant code:

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator()
dataset = datagen.flow_from_directory(
    path_images, 
    target_size=(ImageHeight, ImageWidth), 
    color_mode='rgb', 
    class_mode='sparse', 
    batch_size=BatchSize, 
    shuffle=True, 
    seed=Seed,
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Stratified train-test splitting a Tensorflow dataset

Sources

Related Questions