'How to exclude sentences from Spacy results if it contains a token with a specific dep_?
I would like to negative filter Spacy results. Actually, I would like to get sentences includes only 'pobj' but not 'dobj' in dependency parsing. However, since sentences with 'dobj' are likely to included 'pobj' but not vice versa, Spacy lists also sentences with 'dobj' included.
For instance;
'He pushed the book off the shelf':
He nsubj
pushed ROOT
the det
book dobj
off prep
the det
shelf pobj
'The book fell off the table'
The det
book nsubj
fell ROOT
off prep
the det
table pobj
In both sentence, prep is the immediate head of pobj, therefore;
doc = nlp('He pushed the book off the shelf.The book fell off the table')
for t in doc:
if t.dep_ == 'pobj':
print(t.sent)
would give me the both sentences in return. How can I negative filter correctly to not to list sentences including both 'dobj' and 'pobj' but to list sentence only 'pobj' included
Solution 1:[1]
Well after many attempts, I found the solution as follows;
for a in doc:
if a.dep_ == "prep" and a.pos_ == "ADP" and a.head.pos_ == "VERB":
for b in a.head.children:
if b.dep_ == "nsubj":
sents = [t.sent for t in a.sent]
for n in sents:
for c in n:
if c.dep_ == 'dobj':
pattern2_sents = [c.sent]
if c.dep_ != 'pobj':
pattern4_sents = [c.sent]
However I am not sure why simple iterating if token.dep_ != 'dobj' would not work in the original question.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Fatih Bozda? |
