'tf.data How to npy load data for training?

I am trying to use tf.data.Dataset on my downloaded dataset made over 1mln audio spectrogram that I have already computed and stored on my HardDisk as npy files. I have stored all the paths on a CSV dataframe and I am trying to use tf.data:

data_selected=pd.read_csv(data_path)
filename = data_selected['filename']
idx_frame=data_selected['index_frame']
labels_train=data_selected['class']

Then i use from_tensor_slices:

 dataset = tf.data.Dataset.from_tensor_slices((filename, we_objects, idx_frame)).shuffle(len(spect_names))

From it I created a parse function to load:

def parse_function(filename, label, idx_frame):

Sxx = np.load( filename, allow_pickle = True)
Sxx = Sxx[idx_frame].reshape(Sxx.shape[1],Sxx.shape[2],1)
return Sxx, label

But when I do:

dataset = dataset.map(parse_function, num_parallel_calls= 'auto')

I got error that filename is a tensor and not a string... So I have used py_func:

dataset = dataset.map(lambda item, lab, idx: tf.numpy_function(
      parse_function, [item, lab, idx], [tf.float32, tf.float32, tf.int32],
      num_parallel_calls=tf.data.AUTOTUNE)
dataset.batch(batch_size)
dataset = dataset.prefetch(1)
options = tf.data.Options()
from tensorflow.data.experimental import AutoShardPolicy
options.experimental_distribute.auto_shard_policy = AutoShardPolicy.OFF
dataset = dataset.with_options(options)
dataset = dataset.cache()

And no errors, but then I do:

 history=model.fit(dataset,
             initial_epoch=i, epochs=end_epoch,
              callbacks=[tbCallBack,model_checkpoint])

And I got errors:

 InvalidArgumentError: 3 root error(s) found.
  (0) Invalid argument:  pyfunc_14 returns 2 values, but expects to see 3 values.
     [[{{node PyFunc}}]]
     [[MultiDeviceIteratorGetNextFromShard]]
     [[RemoteCall]]
     [[IteratorGetNextAsOptional]]
     [[div_no_nan/ReadVariableOp/_270]]
  (1) Invalid argument:  pyfunc_14 returns 2 values, but expects to see 3 values.
     [[{{node PyFunc}}]]
     [[MultiDeviceIteratorGetNextFromShard]]
     [[RemoteCall]]
     [[IteratorGetNextAsOptional]]
     [[replica_1/angular_distance/weighted_loss/cond/switch_pred/_149/_88]]
  (2) Invalid argument:  pyfunc_14 returns 2 values, but expects to see 3 values.
     [[{{node PyFunc}}]]
     [[MultiDeviceIteratorGetNextFromShard]]
     [[RemoteCall]]
     [[IteratorGetNextAsOptional]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_51059]

Function call stack:
train_function -> train_function -> train_function
                                                          

Where is the error? Is it the right way to do?

Thank you!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source