'Why the loss function (mse) calculated by keras not the same as mine

I want to test the loss function, mse in keras by myself. However, the calculated answers are different. The definition of mse is below: https://en.wikipedia.org/wiki/Mean_squared_error

The test code is below:

from keras.datasets import boston_housing
import numpy as np
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()


x_train = train_data.astype(np.float32)

from keras import models 
from keras import layers

model = models.Sequential() 
model.add(layers.Dense(64, activation='relu', input_shape=(13,))) 
model.add(layers.Dense(64, activation='relu')) 
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse', metrics=['mae'])

y_train = train_targets.astype(np.float32)
# y_test = test_targets.astype(np.float32)

model.fit(x_train,y_train,epochs=1,batch_size=404)

print(np.mean((y_train - model.predict(x_train).ravel()) ** 2))

It shows that the loss function is around 816 in keras. However, from the definition of mse, the results is around 704. Why are the results different here?



Solution 1:[1]

You did the mean of the differences of the squares, you need to do the mean first on the matrix and then on the list of samples, you didn't do the mean in the numpy arrays, you need to do the difference one sample at the time, not on the list of samples.

Here's the implementattion of mse:

def my_mse(y_true, y_pred):
    return np.mean(np.square(y_pred - y_true))

Do this FOR EACH sample one at the time and then do the mean

Solution 2:[2]

the code you have used is actually doing matrix subtraction, try this code to calculate mse error:

sub_sqr=0
for a,b in zip(y_train,model.predict(x_train)):
    sub_sqr+= (a-b)**2

print(f"rme : {sub_sqr/len(y_train)}")

if you increase your epochs and optimize the model, your mse will be almost same otherwise some difference would be expected.

with this model.fit(x_train,y_train,epochs=4000,batch_size=404)

1/1 [==============================] - 0s 2ms/step - loss: 7.3889 - mse: 7.3889

rme : [7.54777]

Solution 3:[3]

epoch =1

from keras.datasets import boston_housing
import numpy as np


##################### set random seed. ########################
# Seed value
# Apparently you may use different seed values at each stage
seed_value= 0
# 1. Set the `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)
# 2. Set the `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)
# 3. Set the `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)
# 4. Set the `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.random.set_seed(seed_value)
# for later versions: 
# tf.compat.v1.set_random_seed(seed_value)
##################### set random seed. ########################

(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()


x_train = train_data.astype(np.float32)

from keras import models 
from keras import layers

model = models.Sequential() 
model.add(layers.Dense(64, activation='relu', input_shape=(13,))) 
model.add(layers.Dense(64, activation='relu')) 
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse', metrics=['mae'])
y_train = train_targets.astype(np.float32)
#get values before training
y_pred1=model.predict(x_train)
model.fit(x_train,y_train,epochs=1,batch_size=404)
print(np.mean((y_train - model.predict(x_train).ravel()) ** 2))

#model.trainable=True
#get values after training 
y_pred2= model.predict(x_train)#batch_size default setting be 32 
#get values after training 
y_pred3= model.predict(x_train,batch_size=404)
#get values after training 
y_pred4= model.predict(x_train,batch_size=100)

print(np.mean((y_train -y_pred1.ravel()) ** 2))
print(np.mean((y_train -y_pred2.ravel()) ** 2))
print(np.mean((y_train -y_pred3.ravel()) ** 2))
print(np.mean((y_train -y_pred4.ravel()) ** 2))

the result in my machine is [enter image description here][1]

epoch = 2

from keras.datasets import boston_housing
import numpy as np

##################### set random seed. ########################
# Seed value
# Apparently you may use different seed values at each stage
seed_value= 0
# 1. Set the `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)
# 2. Set the `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)
# 3. Set the `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)
# 4. Set the `tensorflow` pseudo-random generator at a fixed value
import tensorflow as tf
tf.random.set_seed(seed_value)
# for later versions: 
# tf.compat.v1.set_random_seed(seed_value)
##################### set random seed. ########################


(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()


x_train = train_data.astype(np.float32)

from keras import models 
from keras import layers

model = models.Sequential() 
model.add(layers.Dense(64, activation='relu', input_shape=(13,))) 
model.add(layers.Dense(64, activation='relu')) 
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss='mse', metrics=['mae'])
y_train = train_targets.astype(np.float32)
#get values before training
y_pred1=model.predict(x_train)
model.fit(x_train,y_train,epochs=2,batch_size=404)
print(np.mean((y_train - model.predict(x_train).ravel()) ** 2))

#model.trainable=True
#get values after training 
y_pred2= model.predict(x_train)#batch_size default setting be 32 
#get values after training 
y_pred3= model.predict(x_train,batch_size=404)
#get values after training 
y_pred4= model.predict(x_train,batch_size=100)

print(np.mean((y_train -y_pred1.ravel()) ** 2))
print(np.mean((y_train -y_pred2.ravel()) ** 2))
print(np.mean((y_train -y_pred3.ravel()) ** 2))
print(np.mean((y_train -y_pred4.ravel()) ** 2))

if epoch=1000, you will find the mse of keras and self implemented MSE will output same value, because the model is well trained,the weight of model will not change

In a word ,the output of keras is always one epoch later than we expected

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 user505794