'Picking the right metric for a model ending with TimeDistributed layer

I am trying to train a model for a NER task, with the model below. I am a bit confused about the right metrics to use here, I was expecting to use a classic CategoricalCrossentropy, but:

  • the model evaluates the accuracy to zero when training and testing
  • however when calculating the accuracy manually it's definitely not zero

I am not familiar with the TimeDistributed layer and I think the issue might be coming from here... The shape of the output of the TD layer and the shape of my targets are the same

What am I am missing?

See below my code:

def init_model():
    input_ids = tf.keras.layers.Input(shape=(SEQ_LEN,),dtype='int32')
    attention_mask = tf.keras.layers.Input(shape=(SEQ_LEN,),dtype='int32')
    
    x = backbone({'input_ids':input_ids,
                 'attention_mask':attention_mask})[0]

    backbone.trainable = False
    
    x = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units = 512,
                                                           activation = 'tanh',
                                                           #recurrent_dropout=.2,
                                                           dropout=.2,
                                                           return_sequences=True))(x)
    #x = tf.keras.layers.LayerNormalization()(x)
    x_res = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units = 512,
                                                           activation = 'tanh',
                                                           #recurrent_dropout=.2,
                                                           dropout=.2,
                                                           return_sequences=True))(x)
    
    x = tf.keras.layers.add([x,x_res])
    output = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(16,activation = 'softmax'))(x)

    model = tf.keras.models.Model(inputs={'input_ids':input_ids,
                                          'attention_mask':attention_mask},outputs=output)
    
    
    return model

and the compiling:

loss = tf.keras.losses.CategoricalCrossentropy(name='categorical_crossentropy')
metric = tf.keras.metrics.Accuracy(name='accuracy')
opt = tf.keras.optimizers.Adam()

model.compile(optimizer=opt,loss=loss,metrics=[metric])


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source