'How to load data for only certain label of Spacy's NER entities?

I just started to explore spaCy and need it only for GPE (Global political entities) of the name entity recognition (NER) component.

So, to save time on loading I keep only 'ner':

    nlp = spacy.load('en_core_web_sm', disable=['tok2vec','tagger','parser', 'senter', 'attribute_ruler', 'lemmatizer'])

Then I create a set of cities / states / countries that exist in the text by running:

doc = nlp(txt) 
geo_ents = {str(word) for word in doc.ents if word.label_=='GPE'}

That means I only need a small subset of the entities with the label_=='GPE'. I didn't find a way yet to iterate only within that component of the whole model to reduce runtime on big loads of texts.

Would you please guide me to how to load only certain label of Spacy's NER entities? That might be helpful for others in order to get only selected types of entities.

Thank you very much!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source