'Training with almost 1 million spectrograms and transfer learning! Bottleneck in data_generator?

Hello I am trying to finetune a pretrained neural network for sound classification. My dataset is pretty big and I have almost 1 million frames of 1 second. I am using a data generator for loading the spectrograms to feed the network but I think my method is a bit slow and not efficient since, even though I have 2 GPUs I can't speed up the training. Here's how I load the spectrograms in the generator:

 def gen_spectrogram(self, filenames,idx_frame):
    spect_path=self.data_path+'SpectrogramsPositivePatches/'
    X_data=[]
    j=0

    for f in filenames:
        Sxx = load(spect_path+f)
        index = idx_frame[j]
        X_data.append(Sxx[index].reshape(Sxx.shape[1],Sxx.shape[2],1))
        j+=1

    return X_data
def get_next(self, partition):

    if partition=='train':
            cur_index=self.train_index
            audio_files = list(zip(*self.train))[1]
            idx_frame=list(zip(*self.train))[2]
            label = list(zip(*self.train))[3]

    elif partition=='test':
            cur_index=self.test_index
            audio_files=list(zip(*self.test))[1]
            idx_frame=list(zip(*self.test))[2]
            label = list(zip(*self.test))[3]

    
    filenames=audio_files[cur_index: cur_index+self.batch_size]
    idx_frames=idx_frame[cur_index: cur_index+self.batch_size]
    labels = label[cur_index: cur_index+self.batch_size]
    X_data = self.gen_spectrogram(filenames,idx_frames)

    return np.array(X_data), np.array(labels)
def next_train(self):
    while True:
        ret = self.get_next('train')
        self.train_index += self.batch_size
        if self.train_index > len(self.train) - self.batch_size:
            self.train_index = 0
            self.shuffle_data_by_partition('train')
        yield ret

def next_test(self):
    while True:
        ret = self.get_next('test')
        self.test_index += self.batch_size
        if self.test_index > len(self.test) - self.batch_size:
            self.test_index = 0
            self.shuffle_data_by_partition('test')
        yield ret

So I am using a GLOBAL_BATCH_SIZE of 256 (128 per GPU) using a mirrored strategy. Is there a way to speed up the training since It keep saying 40 hrs for epochs even though it has to train only a Dense Layer and that's it so I think the bottle neck is in the datagenerator. I have checked on https://keras.io/guides/distributed_training/ this link that I should use tf.Dataset but I am not getting how to use it. Do you have an idea on how I should remove this bottle neck?

Thank you



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source