'Get wrong noun chunks using spacy doc.noun_chunks
I use spacy en_core_web_trf and doc.noun_chunks to get noun chunks. Previously, it worked well, I can get the noun chunks correctly. But recently, since around mid-Dec. 2021 (I guess), I use the same way, same script, but I cannot get the noun chunks correctly. For example, in the following script:
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_trf")
test_sen = "a label on a box that ensures that the status of a parcel can be traced uniquely "
doc = nlp(test_sen)
# ---------------------get initial noun chunks---------------------
for chunks in doc.noun_chunks:
print(chunks)
# displacy.serve(doc, style='dep')
I get the following result:
a label
a box
that
the status
a parcel
But, according to the dependency graph (see below), the text that(the first one) should not be a noun chunk, but it is printed as a noun chunk. Also, according to the description of noun chunks from here, it says that the Doc.noun_chunks: Yields base noun-phrase Span objects. In whichever cases, the that shall not be a so-called noun chunk, but it is identified as a noun chunk here, and it causes a lot of trouble for my later processing.
Does anyone have a hint on how to fix it? Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

