'Torch shape mismatch error while training a GPT2 model

I am trying to train a GPT2 language model for text generation tasks. I am trying to include an additional embedding layer (with POS-tagging) on top of token embedding. Right after the model gets trained, torch throws me the following error:

RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
        size mismatch for transformer.postag.weight: copying a param with shape torch.Size([50262, 768]) from checkpoint, the shape in current model is torch.Size([50273, 768]).

And this would give a better idea on the state when the error is thrown:

Epoch: 100%|██████████| 2/2 [09:23<00:00, 281.55s/it]
05/06/2022 17:31:13 - INFO - train.py:160 : Saving model checkpoint to runs/gpt2/incar/checkpoint-3144
05/06/2022 17:31:21 - INFO - train.py:168 : Saving model checkpoint to runs/gpt2/incar/checkpoint-3144
05/06/2022 17:31:21 - INFO - train.py:362 :  global_step = 3144, average loss = 2.3426512327831968
05/06/2022 17:31:21 - INFO - train.py:368 : Saving model checkpoint to runs/gpt2/incar
Traceback (most recent call last):
  File "train.py", line 394, in <module>
    main()
  File "train.py", line 381, in main
    model = model_class.from_pretrained(args.output_dir)
  File "/data/home1/ssahoo/miniconda3/envs/kgconv2/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1365, in from_pretrained
    _fast_init=_fast_init,
  File "/data/home1/ssahoo/miniconda3/envs/kgconv2/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1512, in _load_state_dict_into_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
        size mismatch for transformer.postag.weight: copying a param with shape torch.Size([50262, 768]) from checkpoint, the shape in current model is torch.Size([50273, 768]).

Here is a snippet of the base model's constructor that I am currently using:

class GPT2Model(GPT2PreTrainedModel):
    def __init__(self, config=None):
        super().__init__(config)
        self.wte = nn.Embedding(config.vocab_size, config.n_embd)
        self.wpe = nn.Embedding(config.n_positions, config.n_embd)
        self.postag = nn.Embedding(30, config.n_embd)
        self.drop = nn.Dropout(config.embd_pdrop)
        self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])
        self.ln_f = nn.LayerNorm(config.n_embd, eps=config.layer_norm_epsilon)
        self.init_weights()

Here config.vocab_size = 50257 and config.n_positions = 1024

Can someone help me understand the issue and suggest a fix, please?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source