'Image augmentation with SMOTE oversampling as batches without running out of RAM
I am trying to use an unbalanced dataset to feed a neural network. I am using colab. I found this code on kaggle which uses keras ImageDataGenerator for augmentation and SMOTE to oversample the data:
Augmentation:
ZOOM = [.99, 1.01]
BRIGHT_RANGE = [0.8, 1.2]
HORZ_FLIP = True
FILL_MODE = "constant"
DATA_FORMAT = "channels_last"
work_dr = ImageDataGenerator(rescale = 1./255, brightness_range=BRIGHT_RANGE, zoom_range=ZOOM, data_format=DATA_FORMAT, fill_mode=FILL_MODE, horizontal_flip=HORZ_FLIP)
train_data_gen = work_dr.flow_from_directory(directory=WORK_DIR, target_size=DIM, batch_size=6500, shuffle=False)
Then he uses next() iterator to load the images:
train_data, train_labels = train_data_gen.next()
print(train_data.shape, train_labels.shape)
Which gives the following outuput:
(6400, 176, 176, 3) (6400, 4)
At this point it has already consumed about 70% of my RAM on Colab not to mention the time taken to load the images. Notice, the batch size is set to 6500 which is a very large but if I set it to something like 32 or 64, then only the first batch is loaded when I use next() Then, to oversample the data, he uses SMOTE:
#Performing over-sampling of the data, since the classes are imbalanced
sm = SMOTE(random_state=42)
train_data, train_labels = sm.fit_resample(train_data.reshape(-1, IMG_SIZE * IMG_SIZE * 3), train_labels)
train_data = train_data.reshape(-1, IMG_SIZE, IMG_SIZE, 3)
print(train_data.shape, train_labels.shape)
This should give the following output:
(12800, 176, 176, 3) (12800, 4)
But instead it overloads my memory and Colab crashes due do RAM shortage. I am not very good at coding so I am having difficulty implementing what I want. What I want is to feed batches of augmented and oversampled data to my neural network without loading the entire dataset at once and thus saving memory. My question is, is there a way to do this? If so, could you please show me how to do it?
Solution 1:[1]
I came across the same problem. You can run the code by copying it in kaggle itself and it runs very smoothly on kaggle. Hope this helps!!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Suru |
