'Sentence Pair Classification using BERT Transformers Value Error
I was doing sentence pair classification using BERT. At first, I encode the sentence pair as
train_encode = tokenizer(train1, train2,padding="max_length",truncation=True)
test_encode = tokenizer(test1, test2,padding="max_length",truncation=True)
where train1 and train2 are lists of sentence pairs.
Then I did:
train_seq = torch.tensor(train_encode['input_ids'])
train_mask = torch.tensor(train_encode['attention_mask'])
train_token = torch.tensor(train_encode['token_type_ids'])
train_y = torch.tensor(y_train.tolist())
And created a data loader as
train_data = TensorDataset(train_seq, train_mask, train_token, train_y)
And defined bert as
model1 = BERT_Arch(model)
optimizer = AdamW(model1.parameters(), lr=0.01)
device = torch.device("cuda") if torch.cuda.is_available() else 
         torch.device("cpu")
model1.to(device)
Model is defined as
model = BertForSequenceClassification.from_pretrained(checkpoint, num_labels=5)
class BERT_Arch(nn.Module):
def __init__(self, bert):
  
  super(BERT_Arch, self).__init__()
  self.bert = bert 
  
  self.dropout = nn.Dropout(0.1)
  
  self.relu =  nn.ReLU()
  self.fc1 = nn.Linear(768,512)
  
  self.fc2 = nn.Linear(512,5)
  self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input_ids, attn_masks, token_type_ids):
  _, cls_hs = self.bert(input_ids, attn_masks, token_type_ids)
  
  x = self.fc1(cls_hs)
  x = self.relu(x)
  x = self.dropout(x)
  x = self.fc2(x)
  
  x = self.softmax(x)
  return x
I created the training loop as
EPOCHS = 5
criterion = nn.CrossEntropyLoss()
total_loss, total_accuracy = 0, 0
total_preds=[]
for epoch in range(EPOCHS):
  model1.train()
  total_train_loss = 0
  total_train_acc  = 0
  for step,batch in enumerate(train_dataloader):
    batch = [r.to(device) for r in batch]
    input_id,attention_mask,token_type_id,y = batch
    optimizer.zero_grad()
    pair_token_ids = pair_token_ids.to(device)
    mask_ids = mask_ids.to(device)
    seg_ids = seg_ids.to(device)
    labels = y.to(device)
    model1.zero_grad()  
    prediction = model1(pair_token_ids,mask_ids,seg_ids)
    loss = criterion(prediction, labels)
    total_loss = total_loss + loss.item()
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    optimizer.step()
    preds=preds.detach().cpu().numpy()
    total_preds.append(preds)
    avg_loss = total_loss / len(train_dataloader)
    total_preds  = np.concatenate(total_preds, axis=0)
print(avg_loss)
I get the following errors as ValueError: not enough values to unpack (expected 2, got 1)
I am not sure what I am doing wrong here? Any suggestions?
The output of
for step,batch in enumerate(train_dataloader):
  batch = [r.to(device) for r in batch]
  input_id,attention_mask,token_type_id,y = batch
is
tensor([[  101,  3191,  1999,  ...,     0,     0,     0],
    [  101,  2023, 11204,  ...,     0,     0,     0],
    [  101,  6140,  1996,  ...,     0,     0,     0],
    ...,
    [  101,  2023, 11204,  ...,     0,     0,     0],
    [  101,  2275,  2039,  ...,     0,     0,     0],
    [  101,  2023,  2240,  ...,     0,     0,     0]], device='cuda:0')
 tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]], device='cuda:0')
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], device='cuda:0')
tensor([3, 4, 1, 3, 1, 4, 2, 1], device='cuda:0')
  
                        
Solution 1:[1]
The problem is the line where you call BERT's forward pass inside your model, and then try to unpack the return values:
_, cls_hs = self.bert(input_ids, attn_masks, token_type_ids)
By default, BertForSequenceClassification returns a SequenceClassifierOutput object, not a tuple, so that line is throwing an error, because it can't unpack the SequenceClassifierOutput object into _, cls_hs.
If you want to return a tuple instead, you should modify the line where you load BertForSequenceClassification by adding return_dict=False.
In addition, by default, BERT does not return any hidden states. If you want to get the model's last hidden state as well as the output logits, you should add output_hidden_states=True. Otherwise the forward pass will simply return one value and _, cls_hs will throw the same error.
So, you could load the model like this instead:
model = BertForSequenceClassification.from_pretrained(
    checkpoint, 
    num_labels=5, 
    return_dict=False, 
    output_hidden_states=True
)
Alternatively, you could modify your forward pass code in BERT_Arch to work with SequenceClassifierOutput.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | BrokenBenchmark | 
