'TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple
So I'm trying to train my BigBird model (BigBirdForSequenceClassification) and I got to the moment of the training, which ends with below error message:
Traceback (most recent call last):
File "C:\Users\######", line 189, in <module>
train_loss, _ = train()
File "C:\Users\######", line 152, in train
loss = cross_entropy(preds, labels)
File "C:\Users\#####\venv\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\######\venv\lib\site-packages\torch\nn\modules\loss.py", line 211, in forward
return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
File "C:\Users\######\venv\lib\site-packages\torch\nn\functional.py", line 2532, in nll_loss
return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not tuple
From what I understand, the problem happens because the train() function returns the tuple. Now - my question is how I should approach such issue? How do I change the output of train() function to return tensor instead of tuple? I have seen similar issues posted here but none of the solutions seems to be helpful in my case, not even
model = BigBirdForSequenceClassification(config).from_pretrained(checkpoint, return_dict=False)
(When I don't add return_dict=False I got similiar error message but it says "TypeError: nll_loss_nd(): argument 'input' (position 1) must be Tensor, not SequenceClassifierOutput"
Please see my train code below:
def train():
model.train()
total_loss = 0
total_preds = []
for step, batch in enumerate(train_dataloader):
if step % 10 == 0 and not step == 0:
print('Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
preds = model(sent_id, mask)
loss = cross_entropy(preds, labels)
total_loss = total_loss + loss.item()
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
optimizer.zero_grad()
preds = preds.detach().cpu().numpy()
total_preds.append(preds)
avg_loss = total_loss / len(train_dataloader)
total_preds = np.concatenate(total_preds, axis=0)
return avg_loss, total_preds
and then:
for epoch in range(epochs):
print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))
train_loss, _ = train()
train_losses.append(train_loss)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
I will really appreciate any help on this case and thank you in advance!
Solution 1:[1]
This is what it returns
- loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) — Classification (or regression if config.num_labels==1) loss.
- logits (torch.FloatTensor of shape (batch_size, config.num_labels)) — Classification (or regression if config.num_labels==1) scores (before SoftMax).
- hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) — Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). Hidden-states of the model at the output of each layer plus the initial embedding outputs.
- attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) — Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | evgeni fotia |
