'How to do inference on new text sample (pytorch, nlp, document classification)

I’m working on a clone of the following project (document classification) in a virtual machine : https://github.com/uvipen/Very-deep-cnn-pytorch I trained the model and created both model files as well as checkpoint files using the following collab notebook (I only use DBpedia and YahooAnswers):

https://colab.research.google.com/drive/17xsJ1oRgQl-m5Re_B6I2-cGxNqAByyYA?usp=sharing

I then took the model/checkpoint files which I generated from this collab for use in the project, loaded the model from the checkpoint (for example, with DBPedia):

dbpedia_ckp_path = “./checkpoint/dbpedia_current_checkpoint.pt”
dbpedia_PATH = “./best_model/dbpedia_best_model.pt”

dbpedia_model = VDCNN(n_classes=14, num_embedding=len(""“abcdefghijklmnopqrstuvwxyz0123456789,;.!?:’”/\|_@#$%^&*~`±=<>()[]{}""") + 1, embedding_dim=16,depth=9, n_fc_neurons=2048, shortcut=False)

dbpedia_model.load_state_dict(torch.load(dbpedia_PATH, map_location=torch.device(‘cpu’))[‘state_dict’])

dbpedia_model.eval()

optimizer = torch.optim.Adam(dbpedia_model.parameters(), lr=0.001)
loaded_dbpedia_model, optimizer, start_epoch, valid_loss_min = load_ckp(dbpedia_ckp_path, dbpedia_model, optimizer)

print("model = ", loaded_dbpedia_model)
print("optimizer = ", optimizer)
print("start_epoch = ", start_epoch)
print("valid_loss_min = ", valid_loss_min)
print(“valid_loss_min = {:.6f}”.format(valid_loss_min))

loaded_dbpedia_model = loaded_dbpedia_model.to(“cpu”)

Now I want to inference on a new sample text and make a prediction, for example:

ex_text_str = “Brekke Church (Norwegian: Brekke kyrkje) is a parish church in Gulen Municipality in Sogn og Fjordane county, Norway. It is located in the village of Brekke. The church is part of the Brekke parish in the Nordhordland deanery in the Diocese of BjÃ¸rgvin. The white, wooden church, which has 390 seats, was consecrated on 19 November 1862 by the local Dean Thomas Erichsen. The architect Christian Henrik Grosch made the designs for the church, which is the third church on the site.”

I’ve tried following this article in order to classify it (using its predict method): https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html

and using this chunk of code:

dbpedia_train_iter = DBpedia(split=‘train’)
dbpedia_tokens = yield_tokens(dbpedia_train_iter)
vocab = build_vocab_from_iterator(dbpedia_tokens, specials=[""])
vocab.set_default_index(vocab[""])
text_pipeline = lambda x: vocab(tokenizer(x))

But on one vm I get the following error:

Traceback (most recent call last):
  File "/home/harel/PycharmProjects/Very-deep-cnn-pytorch/classify_sample.py", line 154, in <module>
    yahoo_vocab = build_vocab_from_iterator(yahoo_tokens, specials=["<unk>"])
  File "/usr/local/lib/python3.8/dist-packages/torchtext/vocab/vocab_factory.py", line 92, in build_vocab_from_iterator
    for tokens in iterator:
  File "/home/harel/PycharmProjects/Very-deep-cnn-pytorch/classify_sample.py", line 70, in yield_tokens
    for _, text in data_iter:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/callable.py", line 112, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torchdata/datapipes/iter/util/plain_text_reader.py", line 148, in __iter__
    for path, file in self.source_datapipe:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/fileopener.py", line 60, in __iter__
    yield from get_file_binaries_from_pathnames(self.datapipe, self.mode, self.encoding)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/utils/common.py", line 85, in get_file_binaries_from_pathnames
    for pathname in pathnames:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/combining.py", line 46, in __iter__
    for data in dp:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/filelister.py", line 51, in __iter__
    for path in self.datapipe:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/grouping.py", line 140, in __iter__
    for element in self.datapipe:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/callable.py", line 112, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 356, in __next__
    return next(self.iterator)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/combining.py", line 190, in get_generator_by_instance
    yield from self.main_datapipe.get_next_element_by_instance(self.instance_id)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/combining.py", line 301, in get_next_element_by_instance
    yield self._find_next(instance_id)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/combining.py", line 275, in _find_next
    value = next(self._datapipe_iterator)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/combining.py", line 46, in __iter__
    for data in dp:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torchdata/datapipes/iter/util/saver.py", line 48, in __iter__
    for filepath, data in self.source_datapipe:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torchdata/datapipes/iter/util/hashchecker.py", line 62, in __iter__
    for file_name, data in self.source_datapipe:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/callable.py", line 112, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/datapipes/iter/callable.py", line 112, in __iter__
    for data in self.datapipe:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_typing.py", line 366, in wrap_generator
    response = gen.send(None)
  File "/usr/local/lib/python3.8/dist-packages/torchdata/datapipes/iter/load/online.py", line 121, in __iter__
    yield _get_response_from_google_drive(url, timeout=self.timeout)
  File "/usr/local/lib/python3.8/dist-packages/torchdata/datapipes/iter/load/online.py", line 85, in _get_response_from_google_drive
    raise RuntimeError("Internal error: headers don't contain content-disposition.")
RuntimeError: Internal error: headers don't contain content-disposition.

On another vm, where the error does not happen, I get another error:

So I tried following another article to classify the new sample for DBpedia: https://developpaper.com/pytorch-text-classification-based-on-torchtext/

but it seems to be outdated and so I didnt make any progress either.

In short, I’d like to write code to inference a new text sample, both for DBpedia and YahooAnswers, and thus print its classification for the user, but all my attempts so far didn't amount to anything. If succesful, the result for the above sample should be This text belongs to NaturalPlace class per the article listed prior. Hypothetically, if I can get it to work for DBpedia, the same code can then be used for YahooAnswers.

Thanks in advance and best regards ~Harel

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to do inference on new text sample (pytorch, nlp, document classification)

Sources

Related Questions