'How to get an image to array, Tensorflow 1.9

So I have to use Tensorflow 1.9 for system specific reasons. I want to train a cnn with a custom dataset consisting of images. The folder structure looks very much like this:

./
  + circles
    - circle-0.jpg
    - circle-1.jpg
    - ...
  + hexagons
    - hexagon-0.jpg
    - hexagon-1.jpg
    - ...
  + ... 

So the example I have to work with uses MNIST and has these two particular lines of code:

mnist_dataset = tf.keras.datasets.mnist.load_data('mnist_data')
(x_train, y_train), (x_test, y_test) = mnist_dataset

In my work, I also have to use this data format (x_train, y_train), (x_test, y_test), which seems to be quite common. As far as I was able to find out up to now, the format of those datasets are: (image_data, label), and is something like ((60000, 28, 28), (60000,)), at least with the MNIST dataset. The image_data here is supposedly of dtype uint8 (according to this post). I was able to find out, that a tf.data.Dataset() object looks like the tuples I need here (image_data, label).


So far so good. But a few questions arise from this information which I wasn't able to figure out yet, and where I would kindly request your help:

  1. (60000, 28, 28) means 60k a 28 x 28 image value array, right?
  2. If 1. is right, how do I get my images (like in the directory structure I described above) into this format? Is there a function which yields an array that I can use like that?
  3. I know I need some kind of generator function which should get all the images with label, because in Tensorflow 1.9 the tf.keras.utils.image_dataset_from_directory() does not seem to exist yet.
  4. How do the labels actually look like? For example, with my directory structure, would I have something like this:

(A)

File Label
circle-0.jpg circle
circle-233.jpg circle
hexagon-1.jpg hexagon
triangle-12.jpg triangle

or (B)

File Label
circle-0.jpg circle-0
circle-233.jpg circle-233
hexagon-1.jpg hexagon-1
triangle-12.jpg triangle-12

, where the respective image is already converted to a "(60000, 28, 28)" format? It seems as if I need to create all my functions by myself, since there does not seem to be a good function which takes a directory structure like mine to a dataset which can be utilized by Tensorflow 1.9, or is there?. I know of the tf.keras.preprocessing.image.ImageDataGenerator and image_dataset_from_directory as well as flow_from_directory(), however, all of them don't seem to bring me my desired dataset value tuple format.

I would really appreciate any help!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source