'tricks in keras sequential model
I have a Keras CNN model and I'm trying to recreate the same model using tensorflow v1 API. But I noticed even though the model structures are the same, their performance are very different, Keras model performance is always better. So I wonder if there is any well known tricks or optimizations enabled by default in Keras Sequential model.
The dataset I used is https://www.kaggle.com/trolukovich/food11-image-dataset. This is my Keras model and the parameters here are just something arbitrary.
model = models.Sequential()
model.add(layers.Conv2D(4, (8, 8),
padding='SAME',
activation='relu',
input_shape=(128, 128, 3),
kernel_initializer='random_normal'))
model.add(layers.MaxPooling2D((2, 2), padding='SAME'))
model.add(layers.Conv2D(8, (8, 8),
activation='relu',
padding='SAME',
kernel_initializer='random_normal'))
model.add(layers.MaxPooling2D((2, 2), padding='SAME'))
model.add(layers.Flatten())
model.add(layers.Dense(16, activation='relu', kernel_initializer='random_normal'))
model.add(layers.Dense(11, kernel_initializer='random_normal'))
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
And this is the model in tensorflow v1 API
def assemble_graph(g):
with g.as_default():
# code for input
image_paths = tfv1.placeholder("string", [None], "image_path")
labels = tfv1.placeholder("int32", [None], "y")
image_paths_ts = tf.convert_to_tensor(image_paths, dtype=tf.string)
labels_ts = tf.one_hot(labels, 11, name="y_onehot", dtype=tf.int32)
dataset = tf.data.Dataset.from_tensor_slices((image_paths_ts, labels_ts))
def map_fn(path, label):
image = tf.image.decode_jpeg(tf.io.read_file(path))
image = tf.image.resize(image, [128, 128])
image = tfv1.to_float(image) / 255.0
return image, label
dataset = dataset.map(map_fn).shuffle(5000).batch(128)
iter = tfv1.data.make_initializable_iterator(dataset)
x, y = iter.get_next()
# model
weights = {
'W_conv1': tf.Variable(tfv1.random_normal([8, 8, 3, 4])),
'W_conv2': tf.Variable(tfv1.random_normal([8, 8, 4, 8])),
'W_fc': tf.Variable(tfv1.random_normal([32*32*8, 16])),
'out': tf.Variable(tfv1.random_normal([16, 11]))
}
biases = {
'b_conv1': tf.Variable(tfv1.zeros([4])),
'b_conv2': tf.Variable(tfv1.zeros([8])),
'b_fc': tf.Variable(tfv1.zeros([16])),
'out': tf.Variable(tfv1.zeros([11])) #for the output neurons
}
layer1 = maxpool2d(tfv1.nn.relu(conv2d(x, weights['W_conv1']) + biases['b_conv1']), 2)
layer2 = maxpool2d(tfv1.nn.relu(conv2d(layer1, weights['W_conv2']) + biases['b_conv2']), 2)
layer2_flat = tf.reshape(layer2, [-1, 32*32*8])
layer3 = tfv1.nn.relu(tf.matmul(layer2_flat, weights['W_fc'])+biases['b_fc'])
logits = tf.matmul(layer3, weights['out'])+biases['out']
cross_entropy = tfv1.reduce_mean(tfv1.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=logits))
optimizer = tfv1.train.AdamOptimizer().minimize(cross_entropy)
return optimizer, cross_entropy
One issue I found with my tfv1 implementation was that logit numbers are huge in the first a couple of iteration, which caused big loss numbers. After I applies batch norm the issue got fixed. But I didn't see big loss numbers in the Keras implementation, so this made my wonder if there is any hidden tricks that Keras Sequential model used.
Thanks.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
