'Torch shape mismatch error while training a GPT2 model
I am trying to train a GPT2 language model for text generation tasks. I am trying to include an additional embedding layer (with POS-tagging) on top of token embedding. Right after the model gets trained, torch throws me the following error:
RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
size mismatch for transformer.postag.weight: copying a param with shape torch.Size([50262, 768]) from checkpoint, the shape in current model is torch.Size([50273, 768]).
And this would give a better idea on the state when the error is thrown:
Epoch: 100%|██████████| 2/2 [09:23<00:00, 281.55s/it]
05/06/2022 17:31:13 - INFO - train.py:160 : Saving model checkpoint to runs/gpt2/incar/checkpoint-3144
05/06/2022 17:31:21 - INFO - train.py:168 : Saving model checkpoint to runs/gpt2/incar/checkpoint-3144
05/06/2022 17:31:21 - INFO - train.py:362 : global_step = 3144, average loss = 2.3426512327831968
05/06/2022 17:31:21 - INFO - train.py:368 : Saving model checkpoint to runs/gpt2/incar
Traceback (most recent call last):
File "train.py", line 394, in <module>
main()
File "train.py", line 381, in main
model = model_class.from_pretrained(args.output_dir)
File "/data/home1/ssahoo/miniconda3/envs/kgconv2/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1365, in from_pretrained
_fast_init=_fast_init,
File "/data/home1/ssahoo/miniconda3/envs/kgconv2/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1512, in _load_state_dict_into_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
size mismatch for transformer.postag.weight: copying a param with shape torch.Size([50262, 768]) from checkpoint, the shape in current model is torch.Size([50273, 768]).
Here is a snippet of the base model's constructor that I am currently using:
class GPT2Model(GPT2PreTrainedModel):
def __init__(self, config=None):
super().__init__(config)
self.wte = nn.Embedding(config.vocab_size, config.n_embd)
self.wpe = nn.Embedding(config.n_positions, config.n_embd)
self.postag = nn.Embedding(30, config.n_embd)
self.drop = nn.Dropout(config.embd_pdrop)
self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])
self.ln_f = nn.LayerNorm(config.n_embd, eps=config.layer_norm_epsilon)
self.init_weights()
Here config.vocab_size = 50257 and config.n_positions = 1024
Can someone help me understand the issue and suggest a fix, please?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|