'Get wrong noun chunks using spacy doc.noun_chunks

I use spacy en_core_web_trf and doc.noun_chunks to get noun chunks. Previously, it worked well, I can get the noun chunks correctly. But recently, since around mid-Dec. 2021 (I guess), I use the same way, same script, but I cannot get the noun chunks correctly. For example, in the following script:

import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_trf")
test_sen = "a label on a box that ensures that the status of a parcel can be traced uniquely "
doc = nlp(test_sen)
# ---------------------get initial noun chunks---------------------
for chunks in doc.noun_chunks:
    print(chunks)
# displacy.serve(doc, style='dep')

I get the following result:

a label
a box
that
the status
a parcel

But, according to the dependency graph (see below), the text that(the first one) should not be a noun chunk, but it is printed as a noun chunk. Also, according to the description of noun chunks from here, it says that the Doc.noun_chunks: Yields base noun-phrase Span objects. In whichever cases, the that shall not be a so-called noun chunk, but it is identified as a noun chunk here, and it causes a lot of trouble for my later processing.

Does anyone have a hint on how to fix it? Thanks!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Get wrong noun chunks using spacy doc.noun_chunks

Sources

Related Questions