'Non-identical accuracy obtained from model.evaluate and from confusion matrix in multiclass classification

I am using transfer learning on EfficientnetB0 model. while the training model's accuracy is about 98% on the training dataset and about the same on the validation/training dataset. After training, its accuracy was the same. But when I constructed the confusion matrix, it was horrible. not even close to 60%. the dataset used while evaluating and while constructing the confusion matrix is the same.

my model goes like this:

class_names = train_data.class_names

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers.experimental import preprocessing

img_augmentation = Sequential(
    [
        preprocessing.RandomRotation(factor=0.15),
        preprocessing.RandomTranslation(height_factor=0.1, width_factor=0.1),
        preprocessing.RandomFlip(),
        preprocessing.RandomContrast(factor=0.1),
    ],
    name="img_augmentation",
)

inputs = tf.keras.layers.Input(shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 3))
x = img_augmentation(inputs)

model = tf.keras.applications.EfficientNetB0(include_top=False, input_tensor=x, weights="imagenet")

# Freeze the pretrained weights
model.trainable = False
# Rebuild top
x = tf.keras.layers.GlobalAveragePooling2D(name="avg_pool")(model.output)
x = tf.keras.layers.BatchNormalization()(x)

x = tf.keras.layers.Dropout(0.3)(x)
x = tf.keras.layers.Dense(5, activation=tf.nn.relu)(x)

outputs = tf.keras.layers.Dense(len(class_names), activation="softmax", name="pred")(x)

# Compile
model = tf.keras.Model(inputs, outputs, name="EfficientNet")
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])

epochs = 25  
hist = model.fit(train_data, epochs=epochs, validation_data=val_data, verbose=2)

After training I evaluate my model, results are given below:

model.evaluate(val_data)

15/15 [==============================] - 4s 166ms/step - loss: 0.0225 - accuracy: 0.9915
[0.022472454234957695, 0.9914529919624329]


results = model.predict(val_data)
results

array([[ 1.0585669 , -1.1383011 , -1.2604154 ,  0.5603893 ,  2.3179712,
         0.06845354],
       [ 2.3666468 , -2.0391636 , -1.6342618 , -0.03171571,  1.4519197 ,
         0.90075827],
       [-3.1899905 , -2.5988445 ,  0.33065823,  3.1682515 ,  2.073614  ,
        -2.4776454 ],
       ...,
       [ 0.74174565,  1.5650615 , -0.4598316 , -4.722933  , -1.4944372 ,
         5.390856  ],
       [-2.03066   , -3.8371902 ,  2.588029  ,  0.5657372 , -0.18608397,
        -1.8834535 ],
       [ 3.8117955 , -4.238594  , -0.9616619 , -3.0951567 ,  2.2566497 ,
         3.1414502 ]], dtype=float32)

import numpy as np

predictions = [np.argmax(cls) for cls in results]
original = []
for image, label in val_data:
    original.extend([int(val) for val in label])

from sklearn import metrics

metrics.confusion_matrix(predictions, original)

which gives results as such

array([[ 50,  61,  45,  56],
       [ 52,  63,  63,  66],
       [ 32,  60,  43,  49],
       [ 39,  42,  56,  59], dtype=int64)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source