'Tensorflow convolutional autoencoders doesn't converge with my data
My problem is to create an autoencoder model to recognize anomalies on a cardboard-like surface. I know that I can use an autoencoder and train it using "good" samples, and later I can use it to recognize "bad" samples (i.e., anomalies).
I've built convolutional autoencoders in Tensorflow based on Building autoencoder in Keras and PyImageSearch autoencoders. Both examples use MNIST dataset and they work perfectly (loss is decreasing, accuracy is growing up to about 0.85). However, I tried to train both autoencoders using my custom pictures, and the problem is that my models don't converge - the training loss (I tried binary_crossentropy and mse as stated in the websites) gets stuck at some level, e.g. 3.0873e-04 or 0.0016 (depending on loss and way of normalizing the data), and accuracy is 0 or sth like 1.2618e-07. Below is the sample network architecture.
input_layer = layers.Input(shape=(28, 28, 1))
data_augmentation = tf.keras.Sequential()
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('horizontal'))
data_augmentation.add(tensorflow.keras.layers.experimental.preprocessing.RandomFlip('vertical'))
x = data_augmentation(input_layer, 0)
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
# the block below was added once and removed later to reduce network depth
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
# the block below was added once and removed later to reduce network depth - END
x = layers.Conv2D(16, (3, 3), padding='same', activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), padding='same', activation='sigmoid')(x)
autoencoder = Model(input_layer, decoded)
My dataset consists of about 50k pictures of size 50x50 (and resized to 28x28) presenting tiny parts of cardboard-like sheet: Example 1, Example 2, and here you can see the source 900x900 picture I used to create 50x50 pictures. For the model, I convert them to grayscale.
I used two ways of data normalization: the first one (taken from the websites) was to split their values by 255, and the second one was to use min-max normalization. The pixel values are in range 59-168. Here is how I create the dataset:
imgs = [cv2.imread(fname) for fname in glob.glob('{}/*.jpg'.format(dir_with_pictures))]
imgs = [img for img in imgs if img is not None] # to remove pictures which OpenCV was not able to load for some reason
dataset = np.array([np.expand_dims(np.array(cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (28, 28)), axis=-1) for img in imgs])
dataset = (dataset - np.min(dataset)) / (np.max(dataset) - np.min(dataset)) # one way of normalization, OR
dataset = dataset.astype('float32') / 255.0 # second way of normalization
And here I compile my model, and later it's trained using my dataset - either the whole dataset, or the training set creating by splitting the dataset 80/20:
autoencoder.compile(optimizer=Adam(learning_rate=1e-3), loss='mse', metrics=['accuracy'])
Is it possible that items in my dataset are 'too similar' to each other and that's why I can't train a good model? Or may there be anything other wrong with the dataset? What can I try to get a converging model? Should I pay attention to accuracy metric?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
