'How to do inference on a test dataset too large for RAM?

I'm training a network to classify audio. First I extract logmel-spectrograms from my audio data, save these in arrays and train my network using these. At each epoch I inference on my test data to get an accuracy estimate.

My training dataset is 24GB and test dataset is 6GB. Both are too large for the RAM. I found that I could extract the logmel-specs from my training data before running the network, save each minibatch in a pickle file, then load these one by one during training.

However, I use .eval() to get the accuracy from my my whole test data at once. This worked when I used smaller datasets as there was no need to split my data up into chunks using different pickle files. However, I'm now trying to figure out how to run the .eval() line or equivalent so that it provides accuracy for the whole test dataset, rather than the smaller chunks I've split it into. Is there a way I can get overall accuracy for my test data using pickle files or another method?

Here is the key component of code at the end where I think this can be done:

correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1)) 
test_accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
test_accuracy1 = test_accuracy.eval({features_input:X_test, labels_input:y_test}) 
test_accuracy_scores.append(test_accuracy1)
print('Test accuracy:', test_accuracy1)

Here is my entire codeblock for the network:

### Train NN, output results
r"""This uses the VGGish model definition within a larger model which adds two 
layers on top, and then trains this larger model. 

We input log-mel spectrograms (X_train) calculated above with associated labels 
(y_train), and feed the batches into the model. Once the model is trained, it 
is then executed on the test log-mel spectrograms (X_test) and the accuracy is output.

Alongside .csv file with the predictions for each 0.96s chunk and their true
class is also output for the test data. Column1 = the logit for the first class,
Column2 = the logit for the scond class etc. The final column is the true class.
"""

num_min_batches = len(os.listdir(pickle_files_dir))/2
os.chdir(scripts_directory)  

def main(X):   
  with tf.Graph().as_default(), tf.Session() as sess:
    # Define VGGish.
    embeddings = vggish_slim.define_vggish_slim(training=FLAGS.train_vggish)
    
    
    # Define a shallow classification model and associated training ops on top
    # of VGGish.
    with tf.variable_scope('mymodel'):
      # Add a fully connected layer with 100 units. Add an activation function
      # to the embeddings since they are pre-activation.
      num_units = 100
      fc = slim.fully_connected(tf.nn.relu(embeddings), num_units)

      # Add a classifier layer at the end, consisting of parallel logistic
      # classifiers, one per class. This allows for multi-class tasks.
      logits = slim.fully_connected(         
          fc, _NUM_CLASSES, activation_fn=None, scope='logits')
      tf.sigmoid(logits, name='prediction')
    
      linear_out= slim.fully_connected(                                      
          fc, _NUM_CLASSES, activation_fn=None, scope='linear_out')
      logits = tf.sigmoid(linear_out, name='logits')
    
      # Add training ops.
      with tf.variable_scope('train'):
        global_step = tf.train.create_global_step()

        # Labels are assumed to be fed as a batch multi-hot vectors, with
        # a 1 in the position of each positive class label, and 0 elsewhere.
        labels_input = tf.placeholder(
            tf.float32, shape=(None, _NUM_CLASSES), name='labels')

        # Cross-entropy label loss.
        xent = tf.nn.sigmoid_cross_entropy_with_logits(
            logits=logits, labels=labels_input, name='xent')    
        loss = tf.reduce_mean(xent, name='loss_op')
        tf.summary.scalar('loss', loss)

        # We use the same optimizer and hyperparameters as used to train VGGish.
        optimizer = tf.train.AdamOptimizer(
            learning_rate=vggish_params.LEARNING_RATE,
            epsilon=vggish_params.ADAM_EPSILON)
        train_op = optimizer.minimize(loss, global_step=global_step)

    # Initialize all variables in the model, and then load the pre-trained
    # VGGish checkpoint.
    sess.run(tf.global_variables_initializer())        
    vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)

    # The training loop.
    features_input = sess.graph.get_tensor_by_name(
        vggish_params.INPUT_TENSOR_NAME)   

    validation_accuracy_scores = []
    test_accuracy_scores = []
    for epoch in range(num_epochs):
            epoch_loss = 0
            i=0
            while i < num_min_batches: 
                #print('mini batch'+str(i))
                X_pickle_file = pickle_files_dir + 'X_train_mini_batch_' + str(i)
                with open(X_pickle_file, "rb") as fp:   # Unpickling
                  batch_x = pickle.load(fp)
                
                y_pickle_file = pickle_files_dir + 'y_train_mini_batch_' + str(i)
                with open(y_pickle_file, "rb") as fp:   # Unpickling
                  batch_y = pickle.load(fp)

                _, c = sess.run([train_op, loss], feed_dict={features_input: batch_x, labels_input: batch_y})
                epoch_loss += c
                i+=1
            #print no. of epochs and loss
            print('Epoch', epoch+1, 'completed out of', num_epochs,', loss:',epoch_loss) 

            #note this adds a small computational cost
            correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1)) 
            test_accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
            test_accuracy1 = test_accuracy.eval({features_input:X_test, labels_input:y_test}) 
            test_accuracy_scores.append(test_accuracy1)
            print('Test accuracy:', test_accuracy1)

if __name__ == '__main__':
  tf.app.run()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to do inference on a test dataset too large for RAM?

Sources

Related Questions