'same output (different probability) from keras sequential binary image classification model
try to build an image binary classification model using keras. Unfortunately, get a same output every time. The probability for each test sample was different, but they all favor one label.
The datasets are balanced. label L(n=250) vs. Label E(n=250): 300 for train, 100 for validate, 100 for test. There is no sample overlap among those groups.
After failing to predict the test dataset, I also used the training dataset for prediction which meant the model would make predictions for the samples that had just been trained. I know it does not make any sense. But it also got same output: Counter({0: 300}).
from keras.layers.core import Dense, Flatten, Dropout
from keras.layers.convolutional import Conv2D, MaxPooling2D, SeparableConv2D
import keras
from keras import layers
from skimage.transform import resize
import math
import os,random
import cv2
import numpy as np
import pandas as pd
from keras.models import Sequential
from sklearn.model_selection import KFold
from sklearn.metrics import confusion_matrix
from collections import Counter
import matplotlib.pyplot as plt
class DataGenerator(keras.utils.Sequence):
def __init__(self, datas, batch_size=32, shuffle=True):
self.batch_size = batch_size
self.datas = datas
self.indexes = np.arange(len(self.datas))
self.shuffle = shuffle
def __len__(self):
return math.ceil(len(self.datas) / float(self.batch_size))
def __getitem__(self, index):
batch_indexs = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
batch_datas = [self.datas[k] for k in batch_indexs]
X, y = self.data_generation(batch_datas)
return X, y
def on_epoch_end(self):
if self.shuffle == True:
np.random.shuffle(self.indexes)
def data_generation(self, batch_datas):
images = []
labels = []
for i, data in enumerate(batch_datas):
image = resize((cv2.imread(data)/255),(128, 128))
image = list(image)
images.append(image)
right = data.rfind("\\",0)
left = data.rfind("\\",0,right)+1
class_name = data[left:right]
if class_name=="e":
labels.append(0)
else:
labels.append(1)
return np.array(images), np.array(labels)
def create_model():
model = Sequential()
model.add(Conv2D(8, kernel_size=(3, 3),
input_shape=(128, 128, 3),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(16, kernel_size=(3, 3),
padding="same",
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(units=64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(optimizer='sgd',
loss='binary_crossentropy',
metrics=['accuracy'])
return model
e_train = []
e_test = []
l_test = []
l_train = []
for file in os.listdir('\e\train')
e_train.append(os.path.join('\e\train',file))
for file in os.listdir('\e\test')
e_test.append(os.path.join('\e\test',file))
for file in os.listdir('\l\train')
l_train.append(os.path.join('\l\train',file))
for file in os.listdir('\l\test')
e_test.append(os.path.join('\l\test',file))
data_tr = e_train + l_train
data_te = e_test + l_test
g_te = DataGenerator(data_te)
kf = KFold(n_splits=4, shuffle=True, random_state=seed)
fold = 1
for train, test in kf.split(data_tr):
model = create_model()
g_tr = DataGenerator(data_tr[train])
g_v = DataGenerator(data_tr[test])
H = model.fit_generator(generator = g_tr, epochs=10,
validation_data = g_v, shuffle=False,
max_queue_size=10,workers=1)
pred = model.predict(g_te, max_queue_size=10, workers=1, verbose=1)
print(pred)
# the probability was different, but the right column of probability were always bigger
# [[0.49817565 0.5018243 ]
# [0.4872172 0.5127828 ]
# [0.48092505 0.519075 ]
predicted_class_indices = [np.argmax(probas) for probas in pred]
print(Counter(predicted_class_indices))
# the output was the same
# Counter({0: 100})
fold = fold + 1
Any thoughts would be appreciated.
Solution 1:[1]
Solution:
Instead of solving a binary classification problem, convert it into a multi-class problem with two classes. So, the output of the last layer will have a softmax activation which will provide a probability distribution for the classes. Refer this tutorial wherein you'll understand the changes which need to be made.
Explanation
You should not use the sigmoid activation in the output layer of the model, while using relu activation in the intermediate layers. The vanilla ReLU ( Rectified Linear Unit ) activation is defined as,
Hence, the range of the activation function is [ 0 , infinity ). On the other hand, considering a sigmoid activation,
The range of the sigmoid function is ( 0 , 1 ). So, if a large signal ( > 0 ) is passed through the sigmoid function, the output will be very close to 1 i.e. a fully saturated firing. The output of the relu function can provide a large signal from the intermediate layers, hence making a fully saturated firing ( or 1s ) at the output layer where the sigmoid activation is performed.
If the logits are [ 5.6 , 1.2 , 3.2 , 4.8 ], the output of the sigmoid function is,
[0.9963157 , 0.76852477, 0.96083426, 0.99183744]
and that of softmax is,
[0.6441953 , 0.00790901, 0.05844009, 0.2894557 ]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Shubham Panchal |


