'Layers to be used after using a pretrained model: When to add GlobalAveragePooling2D()

I am using pretrained models to classify image. My question is what kind of layers do I have to add after using the pretrained model structure in my model, resp. why these two implementations differ. To be specific:

Consider two examples, one using the cats and dogs dataset:

One implementation can be found here. The crucial point is that the base model:

# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
base_model.trainable = False

is frozen and a GlobalAveragePooling2D() is added, before a final tf.keras.layers.Dense(1) is added. So the model structure looks like:

model = tf.keras.Sequential([
  base_model,
  global_average_layer,
  prediction_layer
])

which is equivalent to:

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.GlobalAveragePooling2D()
  tf.keras.layers.Dense(1)
])

So they added not only a final dense(1) layer, but also a GlobalAveragePooling2D() layer before.

The other using the tf flowers dataset:

In this implementation it is different. A GlobalAveragePooling2D() is not added.

feature_extractor_url = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2" 

feature_extractor_layer = hub.KerasLayer(feature_extractor_url,
                                         input_shape=(224,224,3))
feature_extractor_layer.trainable = False

model = tf.keras.Sequential([
  feature_extractor_layer,
  layers.Dense(image_data.num_classes)
])

Where image_data.num_classes is 5 representing the different flower classification. So in this example a GlobalAveragePooling2D() layer is not added.

I do not understand this. Why is this different? When to add a GlobalAveragePooling2D() or not? And what is better / should I do?

I am not sure if the reason is that in one case the dataset cats and dogs is binary classification and in the other it is a multiclass classifcation problem. Or the difference is that in one case tf.keras.applications.MobileNetV2 was used to load MobileNetV2 and in the other implementation hub.KerasLayer was used to get the feature_extractor. When I check the model in the first implementation:

I can see that the last layer is a relu activation layer.

When I check the feature_extractor:

model = tf.keras.Sequential([
  feature_extractor,
  tf.keras.layers.Dense(1)
])

model.summary()

I get the output:

So maybe reason is also that I do not understand the difference between tf.keras.applications.MobileNetV2 vs hub.KerasLayer. The hub.KerasLayer just gives me the feature extractor. I know this, but still I think I did not get the difference between these two methods.

I cannot check the layers of the feature_extractor itself. So feature_extractor.summary() or feature_extractor.layers does not work. How can I inspect the layers here? And how can I know I should add GlobalAveragePooling2D or not?

Solution 1:^[1]

I think difference in output of models "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2" has output is 1d vector * batch_size, you just can't apply Conv2D to it.

Output of tf.keras.applications.MobileNetV2 probably more complex, thus you have more capability to transform one.

Solution 2:^[2]

Lets say there is a model taking [1, 208, 208, 3] images and has 6 pooling layers with kernels [2, 2, 2, 2, 2, 7] which would result in a feature column for image [1, 1, 1, 2048] for 2048 filters in the last conv layer. Note, how the last pooling layer accepts [1, 7, 7, 2048] inputs

If we relax the constrains for the input image (which is typically the case for object deteciton models) than after same set of pooling layers image of size [1, 104, 208, 3] would produce pre-last-pooling output of [1, 4, 7, 2024] and [1, 256, 408, 3] would yeild [1, 8, 13, 2048]. This maps would have about the same amount information as original [1, 7, 7, 2048] but the original pooling layer would not produce a feature column wiht [1, 1, 1, N]. That is why we switch to global pooling layer.

In short, global pooling layer is important if we don't have strict restriction on the input image size (and don't resize the image as the first op in the model).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Dmitry Sokolov
Solution 2	y.selivonchyk

'Layers to be used after using a pretrained model: When to add GlobalAveragePooling2D()

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]