'How to convert a batch of logits of a (Embedding + Linear Layer) to a one-hot encoded batch ? (Pytorch)

I am building a FlaubertForClassification model from scratch (I don't want to use the FlauBertForSequenceClassification model of HuggingFace). Here is the model class :

class FlauBertForClassification(nn.Module):
    def __init__(self, embedding_dim, num_intents, dropout):
        super(FlauBertForClassification, self).__init__()
        self.flaubert = FlaubertModel.from_pretrained("flaubert/flaubert_base_cased")
        self.dropout = nn.Dropout(dropout)
        self.linear = nn.Linear(embedding_dim, num_intents)

    def forward(self, utterances, masks):
        embeddings = self.flaubert(input_ids=utterances, attention_mask=masks)[0]  # (batch size x sequence length x embedding dim)
        dropout_output = self.dropout(embeddings) # (batch size x sequence length x embedding dim)
        logits = self.linear(dropout_output) 
        return logits
  1. When applying the model, the logits are of shape (batch size x sequence length x number of classes). How can I get rid of the sequence length to be able to compute several metrics such as Accuracy with the true labels (of shape : batch size x number of classes).

  2. I don't understand why it worked "correctly" when computing the loss (CrossEntropy) without any modification of the predicted labels.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source