'What is difference between tensorflow_examples.tutorials.mnist and tensorflow.keras.datasets.mnist? Which one to use?

I am working with the MNSIT dataset as I am a beginner. I have found various tutorials and come to know that there are a few key differences in TensorFlow 1. X and TensorFlow 2. X versions. I have installed TensorFlow 2 but the tutorials are mostly in TensorFlow 1. So I am using TensorFlow like this:

import TensorFlow.compat.v1 as tf
tf.disable_v2_behavior()

I can load MNIST dataset using both tensorflow_examples.tutorials.mnist and tensorflow.keras.datasets.mnist but the shape is different is different.

  • For example:

If I use the following code

from tensorflow_examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x_train, y_train, x_valid, y_valid = mnist.train.images, mnist.train.labels, \
                                         mnist.validation.images, mnist.validation.labels

print("Size of:")
print("- Training-set:\t\t{}".format(len(y_train)))
print("- Validation-set:\t{}".format(len(y_valid)))

print('x_train:\t{}'.format(x_train.shape))
print('y_train:\t{}'.format(y_train.shape))
print('x_train:\t{}'.format(x_valid.shape))
print('y_valid:\t{}'.format(y_valid.shape))
  • This gives the following Output:
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Size of:
- Training-set:     55000
- Validation-set:   5000
x_train:    (55000, 784)
y_train:    (55000, 10)
x_train:    (5000, 784)
y_valid:    (5000, 10)

Data type: dtype=float32

[NOTE: As I am using Tensorflow Version 2. I am getting the following warning:

WARNING:tensorflow:From C:\Users\user\AppData\Local\Temp/ipykernel_15480/4179597870.py:8: read_data_sets (from tensorflow_examples.tutorials.mnist.input_data) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as: tensorflow_datasets.load('mnist')
WARNING:tensorflow:From C:\Users\user\anaconda3\envs\py38tf\lib\site-packages\tensorflow_examples\tutorials\mnist\input_data.py:296: _maybe_download (from tensorflow_examples.tutorials.mnist.input_data) is deprecated and will be removed in a future version.

]

So, If I follow the warning and use tensorflow.keras.datasets.mnist to load the data then the shape does not match.

  • Example:
from tensorflow.keras.datasets.mnist import load_data

(x_train, Y_train), (X_test, y_test) = load_data()

##spliting the training dataset into Training, Validation dataset
## Because we need Training, Validation and Test datasets

#Training Dataset and label
X_train = x_train[:55000]
y_train = Y_train[:55000]

#Validation Dataset and label
X_val = x_train[55000:]
y_val = Y_train[55000:]

print("- Training-set:\t\t{}".format(len(y_train)))
print("- Validation-set:\t{}".format(len(y_val)))

print('x_train:\t{}'.format(X_train.shape))
print('y_train:\t{}'.format(y_train.shape))
print('x_train:\t{}'.format(X_val.shape))
print('y_valid:\t{}'.format(y_val.shape))
  • Output:
- Training-set:     55000
- Validation-set:   5000
x_train:    (55000, 28, 28)
y_train:    (55000,)
x_train:    (5000, 28, 28)
y_valid:    (5000,)

Also the datatype is different: dtype=uint8

Please help me to understand the difference and which one to use. Because when I build a neural network model and try to train it then I get an error because of this.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source