'How to obtain full GPE for named entity recognition using NLTK ? Misses full name or full city
How do you fix duplication of names, obtaining full names and fix location errors during NER modeling using NLTK ?
import nltk
from nltk import ne_chunk, pos_tag, word_tokenize
sentence = 'Mark, Anitha and Ann Hathway are working at Crazybook. Mark Anthony arrived from Ghana and the second person moved from India to Crazy Bel Technologies in San diego before arriving here in Mountain View'
for sent in nltk.sent_tokenize(sentence):
for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
if hasattr(chunk, 'label'):
print(chunk.label(), ' '.join(c[0] for c in chunk))
PERSON Mark
PERSON Anitha
PERSON Ann Hathway
ORGANIZATION Crazybook
PERSON Mark
PERSON Anthony
GPE Ghana
GPE India
PERSON Crazy Bel
GPE San
GPE Mountain
Issue #1 as seen in the output is that person Mark #1 and Mark #2, Anthony are all the same in the context and how do you detect this ?
Issue #2 is about misssing Crazy Bel Technologies as an ORGANIZATION
Issue #3 is about missing San Diego as the GPE and only detecting San and similarly only Mountain instead of Mountain View in the last case
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
