'Does cache() counter the effects of shuffling input data between epochs

Hopefully I've understood the process enough that this quesiton actually makes sense.

For training my model (time series) I do my preprocessing outside of tf.dataset functions. I save these as numpy arrays split into several files (eg. train_1.npy,.. train_n.npy.) and load them to model.fit using a generator.

train_dataset = tf.data.Dataset.from_generator(generator=sequence_generator,args= ['train'], output_types = (tf.float32, tf.float32))


train_dataset = train_dataset.cache().batch(BATCH).prefetch(tf.data.AUTOTUNE)

validation_dataset = validation_dataset.cache().batch(BATCH).prefetch(tf.data.AUTOTUNE)

test_dataset = test_dataset.batch(64)


The generator yields a random file from the set of train/val/test .npy files to help with avoiding overfitting.

My question is, between epochs, does the .cache() part of my code nullify the effect of randomising the .npy files yielded.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source