'How can I tune neuralcoref to get the better coreference results?
I'm using neuralcoref - a coreference resolution module based on the spaCy parser. GIT https://github.com/huggingface/neuralcoref
However, the results I'm getting could be improved. The online visualizer provided by huggingface (developer of neuralcoref) gives me more accurate results.
The text I'm analyzing: "London is the capital and most populous city of England and the United Kingdom. Standing on the River Thames in the south east of the island of Great Britain, it has been a major settlement for two millennia."
I get this result:
doc._.coref_resolved
London is the capital and most populous city of England and the United Kingdom. Standing on the River Thames in the south east of the island of Great Britain, the River Thames has been a major settlement for two millennia.
So it's mistakingly linking London with River Thames. (it -> River Thames)
The neuralcoref online visualizer returns the correct link (it -> London)
I have already tried tuning parameters such as greedyness, max_dist mentioned on the project's git page https://github.com/huggingface/neuralcoref
import spacy
nlp = spacy.load('en_core_web_lg')
import neuralcoref
neuralcoref.add_to_pipe(nlp,greedyness=0.5,store_scores=True)
text = "London is the capital and most populous city of England and the United Kingdom. Standing on the River Thames in the south east of the island of Great Britain, it has been a major settlement for two millennia."# It was founded by the Romans, who named it Londinium."
doc = nlp(text)
print(doc._.coref_resolved)
doc._.coref_scores
Is there a way to tune it to get results similar to those from the visualizer?
Thank you!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
