Recently, I made a GRU model with Pytorch. When the model has one nn.GRU layer, it runs well. But when there are more than one GRU layer, the model would report
I am training the coarse-to-fine coreference model (for some other language than English) from Allennlp with template configs from bert_lstm.jsonnet. When I rep