'How many nodes should I have in the last layer of neural network for binary classification?

I believed that, if I have a binary-classification problem then I should always have only 1 node in the last layer, since the last layer has to decide about the classification. However, in the following code it is not true.

Let's download the pizza/steak datasets (image dataset) and prepare the data using the ImageDataGenerator:

import zipfile
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Dropout, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image_dataset_from_directory
from tensorflow.keras.applications import EfficientNetB0, resnet50
from tensorflow.keras.models import Sequential
import numpy as np
import pandas as pd

!wget https://storage.googleapis.com/ztm_tf_course/food_vision/pizza_steak.zip
zip_ref = zipfile.ZipFile("pizza_steak.zip", "r")
zip_ref.extractall()
zip_ref.close()

train_directory = './pizza_steak/train/'
test_directory = './pizza_steak/test/'
IMAGE_SIZE = (224, 224)

image_data_generator = ImageDataGenerator(rescale=1. / 255,
                                          zoom_range=0.2,
                                          shear_range=0.2,
                                          rotation_range=0.2)

train_dt = image_data_generator.flow_from_directory(directory=train_directory,
                                                    class_mode='categorical',
                                                    batch_size=32,
                                                    target_size=IMAGE_SIZE)

test_dt = image_data_generator.flow_from_directory(directory=test_directory,
                                                   class_mode='categorical',
                                                   batch_size=32,
                                                   target_size=IMAGE_SIZE)

and then build, compile a neural-network and fit the data on it:

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=3, activation='relu'))
model.add(Conv2D(filters=16, kernel_size=3, activation='relu'))
model.add(MaxPooling2D())
model.add(Conv2D(filters=16, kernel_size=3, activation='relu'))
model.add(Conv2D(filters=16, kernel_size=3, activation='relu'))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(train_dt,
          epochs=5,
          validation_data=test_dt,
          validation_steps=len(test_dt)

As you can see the val_accuracy is not better than 0.5000, which is very bad!

And now if you just change the last layer to model.add(Dense(2, activation='sigmoid')) and run the same model with 2 nodes in the last layer, you will end up with a far better result, such as val_accuracy: 0.8680.

How should know, how many nodes should I have in the last layer when I have a binary-classification model?



Solution 1:[1]

Thanks to @Dr.Snoopy, i add an answer here jut to complete the question.

The point is how do we label our data using the image_data_generator.flow_from_directory().

If we set the class_mode='categorical' then the target is ONE_HOT and the number of nodes in the last layer is equal to "number of classes of target feature". In my case, it is a binary feature, so i need to have 2 nodes in the last layer.

However, if we use class_mode='binary' then the target is indexed and we can have only one node in the last layer.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jeff