'TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple

So I'm trying to train my BigBird model (BigBirdForSequenceClassification) and I got to the moment of the training, which ends with below error message:

Traceback (most recent call last):
  File "C:\Users\######", line 189, in <module>
    train_loss, _ = train()  
  File "C:\Users\######", line 152, in train
    loss = cross_entropy(preds, labels)
  File "C:\Users\#####\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\######\venv\lib\site-packages\torch\nn\modules\loss.py", line 211, in forward
    return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
  File "C:\Users\######\venv\lib\site-packages\torch\nn\functional.py", line 2532, in nll_loss
    return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple

From what I understand, the problem happens because the train() function returns the tuple. Now - my question is how I should approach such issue? How do I change the output of train() function to return tensor instead of tuple? I have seen similar issues posted here but none of the solutions seems to be helpful in my case, not even

model = BigBirdForSequenceClassification(config).from_pretrained(checkpoint, return_dict=False)

(When I don't add return_dict=False I got similiar error message but it says "TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not SequenceClassifierOutput" Please see my train code below:

def train():
    model.train()
    total_loss = 0
    total_preds = []
    
    for step, batch in enumerate(train_dataloader):
        
        if step % 10 == 0 and not step == 0:
            print('Batch {:>5,}  of  {:>5,}.'.format(step, len(train_dataloader)))
            
        batch = [r.to(device) for r in batch]
        sent_id, mask, labels = batch

        preds = model(sent_id, mask)

        loss = cross_entropy(preds, labels)
        total_loss = total_loss + loss.item()
        loss.backward()

        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        optimizer.zero_grad()
        preds = preds.detach().cpu().numpy()
        total_preds.append(preds)

    avg_loss = total_loss / len(train_dataloader)
    total_preds = np.concatenate(total_preds, axis=0)
    return avg_loss, total_preds

and then:

for epoch in range(epochs):
    print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

    train_loss, _ = train()  
    train_losses.append(train_loss)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

I will really appreciate any help on this case and thank you in advance!

Solution 1:^[1]

This is what it returns

loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) — Classification (or regression if config.num_labels==1) loss.
logits (torch.FloatTensor of shape (batch_size, config.num_labels)) — Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) — Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) — Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	evgeni fotia

'TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]