'How to extract data/labels back from TensorFlow dataset

there are plenty of examples how to create and use TensorFlow datasets, e.g.

dataset = tf.data.Dataset.from_tensor_slices((images, labels))

My question is how to get back the data/labels from the TF dataset in numpy form? In other words want would be reverse operation of the line above, i.e. I have a TF dataset and want to get back images and labels from it.



Solution 1:[1]

In case your tf.data.Dataset is batched, the following code will retrieve all the y labels:

y = np.concatenate([y for x, y in ds], axis=0)

Quick explanation: [y for x, y in ds] is known as “list comprehension” in python. If dataset is batched, this expression will loop thru each batch and put each batch y (a TF 1D tensor) in the list, and return it. Then, np.concatenate will take this list of 1-D tensor (implicitly casting to numpy) and stack it in the 0-axis to produce a single long vector. In summary, it is just converting a bunch of 1-d little vector into one long vector. Note: if your y is more complex, this answer will need some minor modification.

Solution 2:[2]

Supposing our tf.data.Dataset is called train_dataset , with eager_execution on (default in TF 2.x), you can retrieve images and labels like this:

for images, labels in train_dataset.take(1):  # only take first element of dataset
    numpy_images = images.numpy()
    numpy_labels = labels.numpy()
  • the inline operation .numpy() converts tf.Tensors in numpy arrays
  • if you want to retrieve more elements of the dataset, just increase the number inside the take method. If you want all elements, just insert -1

Solution 3:[3]

If you are OK with keeping the images and labels as tf.Tensors, you can do

images, labels = tuple(zip(*dataset))

Think of the effect of the dataset as zip(images, labels). When we want to get images and labels back, we can simply unzip it.

If you need the numpy array version, convert them using np.array():

images = np.array(images)
labels = np.array(labels)

Solution 4:[4]

I think we get a good example here:

https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb#scrollTo=BC4pEXtkp4K-

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

# where mnsit train is a tf dataset
mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN)
assert isinstance(mnist_train, tf.data.Dataset)

mnist_example, = mnist_train.take(1)
image, label = mnist_example["image"], mnist_example["label"]

plt.imshow(image.numpy()[:, :, 0].astype(np.float32), cmap=plt.get_cmap("gray"))
print("Label: %d" % label.numpy())

So each individual component of the dataset can be accessed sort of like a dictionary. Presumably different datasets have different field names (Boston housing won't have image, and value, but might have 'features' and 'target' or 'price':

cnn = tfds.load(name="cnn_dailymail", split=tfds.Split.TRAIN)
assert isinstance(cnn, tf.data.Dataset)
cnn_ex, = cnn.take(1)
print(cnn_ex)

returns a dict() with keys ['article', 'highlight'] with numpy strings inside.

Solution 5:[5]

Here is my own solution to the problem:

def dataset2numpy(dataset, steps=1):
    "Helper function to get data/labels back from TF dataset"
    iterator = dataset.make_one_shot_iterator()
    next_val = iterator.get_next()
    with tf.Session() as sess:
        for _ in range(steps):
           inputs, labels = sess.run(next_val)
           yield inputs, labels

Please note that this function will yield inputs/labels of dataset batch. The steps control how many batches from a dataset will be taken out.

Solution 6:[6]

This worked for me

features = np.array([list(x[0].numpy()) for x in list(ds_test)])
labels = np.array([x[1].numpy() for x in list(ds_test)])



# NOTE: ds_test was created
iris, iris_info = tfds.load('iris', with_info=True)
ds_orig = iris['train']
ds_orig = ds_orig.shuffle(150, reshuffle_each_iteration=False)
ds_train = ds_orig.take(100)
ds_test = ds_orig.skip(100)

Solution 7:[7]

import numpy as np
import tensorflow as tf

batched_features = tf.constant([[[1, 3], [2, 3]],
                                [[2, 1], [1, 2]],
                                [[3, 3], [3, 2]]], shape=(3, 2, 2))
batched_labels = tf.constant([[0, 0],
                              [1, 1],
                              [0, 1]], shape=(3, 2, 1))
dataset = tf.data.Dataset.from_tensor_slices((batched_features, batched_labels))
classes = np.concatenate([y for x, y in dataset], axis=0)
unique = np.unique(classes, return_counts=True)
labels_dict = dict(zip(unique[0], unique[1]))
print(classes)
print(labels_dict)
# {0: 3, 1: 3}

Solution 8:[8]

TensorFlow's get_single_element() is finally around which can be used to extract data and labels back from datasets.

This avoids the need of generating and using an iterator using .map() or iter() (which could be costly for big datasets).

get_single_element() returns a tensor (or a tuple or dict of tensors) encapsulating all the members of the dataset. We need to pass all the members of the dataset batched into a single element.

This can be used to get features as a tensor-array, or features and labels as a tuple or dictionary (of tensor-arrays) depending upon how the original dataset was created.

Check this answer on SO for an example that unpacks features and labels into a tuple of tensor-arrays.

Solution 9:[9]

https://www.tensorflow.org/tutorials/images/classification

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
  for i in range(9):
  ax = plt.subplot(3, 3, i + 1)
  plt.imshow(images[i].numpy().astype("uint8"))
  plt.title(class_names[labels[i]])
  plt.axis("off")

Solution 10:[10]

You can use TF Dataset method unbatch() to unbatch the dataset, then you can easily retrieve the data and the labels from it:

ds_labels=[]
for images, labels in ds.unbatch():
    ds_labels.append(labels) # or labels.numpy().argmax() for int labels

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 happymacaron
Solution 4
Solution 5 Valentin
Solution 6 Sourcerer
Solution 7 XerCis
Solution 8 manisar
Solution 9 Imran
Solution 10 Youcef4k