'How exactly class weights intervene in the backpropagation equations of a neural network?

I'm implementing a classification neural network from scratch (no libraries except numpy). I am following this tutorial : http://neuralnetworksanddeeplearning.com/chap2.html . However, I am dealing with an unbalanced dataset. Thus I compute a class weight for each category/class of the dataset. I compute the class weight w_i of the i-th category with the formula (got from this article : https://www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/ ) :

w_i = n / (N * n_i)

where :

  • n is the total number of samples in the dataset;
  • n_i is the number of samples of category i
  • N is the number of categories

Now that I have these class weights, how do they intervene in the backpropagation algorithm ? Where exactly must I use these w_i coefficients in the backpropagation formulas and how ?

My first idea (that I got from this article : https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/class-weights ) is that the loss and the error simply get multiplied by the class weight of the category that was expected for this sample :

# y : expected output of the network
# activations : array of outputs at each layer, thus activations[-1]
#               is the output of the network
# zs : inputs of each layer (i.e. the input vector at a layer before
#      it goes through the activation function)

# y and activations[-1] are of shape : (number of categories, 1)

# Get the class weight of the expected category
class_id = np.argmax(y)
cw = class_weights[class_id]

# Loss (only used for metrics) : Mean Squared Error (MSE)
loss = 1 / y.shape[0] * np.sum((y - activations[-1]) ** 2)
# Apply class weight on the loss
loss *= cw

# (simplified) MSE cost derivative (error)
# Applying the class weight to the error
error = (activations[-1] - y) * self.dactivation(zs[-1])
# Applying the class weight to the error
error *= cw

# Gradients of the last layer
w_grads[-1] = np.matmul(error, activations[-2].transpose())
b_grads[-1] = error

However, this results in a very slow training, here is an extract of the training logs :

Epoch 0: 0.0% (0 / 21892) | loss : 0.1675
Epoch 1: 0.0% (4 / 21892) | loss : 0.1578
Epoch 2: 0.0% (1 / 21892) | loss : 0.1521
Epoch 3: 0.4% (88 / 21892) | loss : 0.1460
Epoch 4: 4.8% (1045 / 21892) | loss : 0.1352
Epoch 5: 6.9% (1514 / 21892) | loss : 0.1230
Epoch 6: 7.9% (1726 / 21892) | loss : 0.1134
Epoch 7: 7.9% (1740 / 21892) | loss : 0.1060
Epoch 8: 8.3% (1807 / 21892) | loss : 0.1006
Epoch 9: 8.6% (1893 / 21892) | loss : 0.0963
Epoch 10: 9.5% (2076 / 21892) | loss : 0.0927
Epoch 11: 10.2% (2228 / 21892) | loss : 0.0894
Epoch 12: 12.9% (2829 / 21892) | loss : 0.0865
Epoch 13: 15.9% (3470 / 21892) | loss : 0.0838
Epoch 14: 18.8% (4109 / 21892) | loss : 0.0815
Epoch 15: 26.0% (5701 / 21892) | loss : 0.0795

The accuracy is calculated on a test dataset, and is at 0 % after the first epoch (shouldn't it be at around 50% ?) and then very slowly increases. The batch size is 200 samples. Training a Keras model with the same layers, activation functions, loss and class weights on the same dataset is way faster and more efficient (reaching up to 92% accuracy after 30 epochs).

Thanks in advance for your help.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source