'Keras LSTM: set class weights with sample_weight for 3D-data

I am actually implementing a sequential multiclass labeling model of text data and have a very unbalanced training data set. I have the following occurrence of labels in my dataset (rounded):

Label 0: 0.9297
Label 1: 0.0337
Label 2: 0.0337
Label 3: 0.0011
Label 4: 0.0011

My actual model predicts only Label 0, since this unbalanced training data. My code looks like this:

X_trian.shape
-> (784,300,7)

y_train.shape
-> (784,300,1)

model = Sequential()
model.add(LSTM(150, input_shape=(300,7), return_sequences=True))
model.add(Dense(len(label2index)))
model.compile(loss="sparse_categorical_crossentropy",
                  metrics=['accuracy'])
model.summary()
model.fit(X_train, y_train, epochs=25)
y_pred = model.predict(X_test, verbose=1)

So each timestep consists of 300 tokens and each token is going to be labeled. So I thought maybe I can handle the problem of the unbalanced data with class weights. The first approach of my research was to use the class_weight argument in model.fit together with a dictionary with a weight entry for each label. But this raised an Exception: class_weight not supported for 3+ dimensional targets.

Now I've read that for 3D data you should use sample_weigt instead of class_weight, but how exactly would that apply to my data? The posts I find about this confuse me quite a bit. Does anyone have any tips on how to use this?



Solution 1:[1]

You cannot use "sample" weight because each sample (sequence) has many different classes.

It also seems that you cannot resample your data because classes are not samples, but sequential themselves.

So you will need a custom loss function


def weightedLoss(weights=tf.constant([1,1,1,1,1])):
    def innerLoss(true, pred):
        w = tf.gather(weights, true) #(batch, labels, 1)
        w = tf.reshape(w, (-1,labels)) #(batch, labels)

        loss = tf.keras.backend.sparse_categorical_crossentropy(true, pred) #(batch, labels)
        loss = w * loss

        return loss

    return innerLoss

Choose proper weights and use it:

weights = tf.constant([0.215, 5.9, 5.9, 166.7, 166.7]) 
model.compile(loss=weightedLoss(weights), optimizer = 'adam')

Choosing the weights

For the weights, I used a "try not to change the learning rate" approach. The proportions of the weights is clear: more for the rare class, less for the common class.

So the thinking process goes like:

#class proportions (must sum 1)
proportions = [.9298, .0339, .0339, .0012, .0012]
originalWeights = [1,1,1,1,1] #if you don't set weights, they're all 1

#number of classes
n = len(proportions)

#how do the original weights (1,1,1,1,1) change the LR?
#each weight multiplies its respective proportion of elements
originalLrChange = sum([w*p for (w,p) in zip(originalWeights, proportions)])
print("originalLRChange", originalLrChange)

#lets propose weights that are proportionally correct, without thinking of the LR
proposedWeights = [1/a for a in proportions]
print("proposedWeights before", proposedWeights)

#how do the new weights change the LR?
proposedLrChange = sum([w*p for (w,p) in zip(proposedWeights, proportions)])
print("proposedLrChange", proposedLrChange)

#Let's then adjust for keeping roughly no LR change
adjustedWeights = [w/n for w in proposedWeights]
print("adjustedWeights", adjustedWeights)

#How the adjustments change LR?
adjustedLrChange = sum([w*p for (w,p) in zip(adjustedWeights, proportions)])
print("adjustedLrChange", adjustedLrChange)

Outputs:

originalLRChange 1.0
proposedWeights before [1.0755001075500108, 29.498525073746315, 29.498525073746315, 833.3333333333334, 833.3333333333334]
proposedLrChange 5.0
adjustedWeights [0.21510002151000215, 5.899705014749263, 5.899705014749263, 166.66666666666669, 166.66666666666669]
adjustedLrChange 1.0

So, in short, you can get the same result with this:

proportions = [.9298, .0339, .0339, .0012, .0012]
inverseN = 1 / len(proportions)
weights = [inverseN/proportion for proportion in proportions]
print(weights)
weights = tf.constant(weights)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1