'tricks in keras sequential model

I have a Keras CNN model and I'm trying to recreate the same model using tensorflow v1 API. But I noticed even though the model structures are the same, their performance are very different, Keras model performance is always better. So I wonder if there is any well known tricks or optimizations enabled by default in Keras Sequential model.

The dataset I used is https://www.kaggle.com/trolukovich/food11-image-dataset. This is my Keras model and the parameters here are just something arbitrary.

model = models.Sequential()
model.add(layers.Conv2D(4, (8, 8), 
          padding='SAME',
          activation='relu', 
          input_shape=(128, 128, 3), 
          kernel_initializer='random_normal'))
model.add(layers.MaxPooling2D((2, 2),  padding='SAME'))
model.add(layers.Conv2D(8, (8, 8), 
          activation='relu', 
          padding='SAME', 
          kernel_initializer='random_normal'))
model.add(layers.MaxPooling2D((2, 2), padding='SAME'))
model.add(layers.Flatten())
model.add(layers.Dense(16, activation='relu', kernel_initializer='random_normal'))
model.add(layers.Dense(11, kernel_initializer='random_normal'))

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

And this is the model in tensorflow v1 API

def assemble_graph(g):
  with g.as_default():
    # code for input
    image_paths = tfv1.placeholder("string", [None], "image_path")
    labels = tfv1.placeholder("int32", [None], "y")

    image_paths_ts = tf.convert_to_tensor(image_paths, dtype=tf.string)
    labels_ts = tf.one_hot(labels, 11, name="y_onehot", dtype=tf.int32)
    dataset = tf.data.Dataset.from_tensor_slices((image_paths_ts, labels_ts))
    def map_fn(path, label):
      image = tf.image.decode_jpeg(tf.io.read_file(path))
      image = tf.image.resize(image, [128, 128])
      image = tfv1.to_float(image) / 255.0
      return image, label
  
    dataset = dataset.map(map_fn).shuffle(5000).batch(128)
    iter = tfv1.data.make_initializable_iterator(dataset)
    x, y = iter.get_next()

    # model
    weights = {
        'W_conv1': tf.Variable(tfv1.random_normal([8, 8, 3, 4])),
        'W_conv2': tf.Variable(tfv1.random_normal([8, 8, 4, 8])),
        'W_fc': tf.Variable(tfv1.random_normal([32*32*8, 16])),
        'out': tf.Variable(tfv1.random_normal([16, 11]))
    }

    biases = { 
        'b_conv1': tf.Variable(tfv1.zeros([4])),
        'b_conv2': tf.Variable(tfv1.zeros([8])),
        'b_fc': tf.Variable(tfv1.zeros([16])),
        'out': tf.Variable(tfv1.zeros([11])) #for the output neurons
    }
    layer1 = maxpool2d(tfv1.nn.relu(conv2d(x, weights['W_conv1']) + biases['b_conv1']), 2)
    layer2 = maxpool2d(tfv1.nn.relu(conv2d(layer1, weights['W_conv2']) + biases['b_conv2']), 2)
    layer2_flat = tf.reshape(layer2, [-1, 32*32*8])
    layer3 = tfv1.nn.relu(tf.matmul(layer2_flat, weights['W_fc'])+biases['b_fc'])
    logits = tf.matmul(layer3, weights['out'])+biases['out']

    cross_entropy = tfv1.reduce_mean(tfv1.nn.softmax_cross_entropy_with_logits_v2(labels=y, logits=logits))
    optimizer = tfv1.train.AdamOptimizer().minimize(cross_entropy)

    return optimizer, cross_entropy

One issue I found with my tfv1 implementation was that logit numbers are huge in the first a couple of iteration, which caused big loss numbers. After I applies batch norm the issue got fixed. But I didn't see big loss numbers in the Keras implementation, so this made my wonder if there is any hidden tricks that Keras Sequential model used.

Thanks.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source