'What is difference between tensorflow_examples.tutorials.mnist and tensorflow.keras.datasets.mnist? Which one to use?
I am working with the MNSIT dataset as I am a beginner. I have found various tutorials and come to know that there are a few key differences in TensorFlow 1. X and TensorFlow 2. X versions. I have installed TensorFlow 2 but the tutorials are mostly in TensorFlow 1. So I am using TensorFlow like this:
import TensorFlow.compat.v1 as tf
tf.disable_v2_behavior()
I can load MNIST dataset using both tensorflow_examples.tutorials.mnist and tensorflow.keras.datasets.mnist but the shape is different is different.
- For example:
If I use the following code
from tensorflow_examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x_train, y_train, x_valid, y_valid = mnist.train.images, mnist.train.labels, \
mnist.validation.images, mnist.validation.labels
print("Size of:")
print("- Training-set:\t\t{}".format(len(y_train)))
print("- Validation-set:\t{}".format(len(y_valid)))
print('x_train:\t{}'.format(x_train.shape))
print('y_train:\t{}'.format(y_train.shape))
print('x_train:\t{}'.format(x_valid.shape))
print('y_valid:\t{}'.format(y_valid.shape))
- This gives the following Output:
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Size of:
- Training-set: 55000
- Validation-set: 5000
x_train: (55000, 784)
y_train: (55000, 10)
x_train: (5000, 784)
y_valid: (5000, 10)
Data type: dtype=float32
[NOTE: As I am using Tensorflow Version 2. I am getting the following warning:
WARNING:tensorflow:From C:\Users\user\AppData\Local\Temp/ipykernel_15480/4179597870.py:8: read_data_sets (from tensorflow_examples.tutorials.mnist.input_data) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as: tensorflow_datasets.load('mnist')
WARNING:tensorflow:From C:\Users\user\anaconda3\envs\py38tf\lib\site-packages\tensorflow_examples\tutorials\mnist\input_data.py:296: _maybe_download (from tensorflow_examples.tutorials.mnist.input_data) is deprecated and will be removed in a future version.
]
So, If I follow the warning and use tensorflow.keras.datasets.mnist to load the data then the shape does not match.
- Example:
from tensorflow.keras.datasets.mnist import load_data
(x_train, Y_train), (X_test, y_test) = load_data()
##spliting the training dataset into Training, Validation dataset
## Because we need Training, Validation and Test datasets
#Training Dataset and label
X_train = x_train[:55000]
y_train = Y_train[:55000]
#Validation Dataset and label
X_val = x_train[55000:]
y_val = Y_train[55000:]
print("- Training-set:\t\t{}".format(len(y_train)))
print("- Validation-set:\t{}".format(len(y_val)))
print('x_train:\t{}'.format(X_train.shape))
print('y_train:\t{}'.format(y_train.shape))
print('x_train:\t{}'.format(X_val.shape))
print('y_valid:\t{}'.format(y_val.shape))
- Output:
- Training-set: 55000
- Validation-set: 5000
x_train: (55000, 28, 28)
y_train: (55000,)
x_train: (5000, 28, 28)
y_valid: (5000,)
Also the datatype is different: dtype=uint8
Please help me to understand the difference and which one to use. Because when I build a neural network model and try to train it then I get an error because of this.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
