'How to get De-Duplicated OpenIE (Clause Extraction) Results?
I've exhausted all the configuration options I'm aware of:
from openie import StanfordOpenIE
# https://stanfordnlp.github.io/CoreNLP/openie.html#api
# Default value of openie.affinity_probability_cap was 1/3.
properties = {
"annotators":"tokenize,ssplit,pos,depparse,natlog,openie",
'openie.affinity_probability_cap': 2 / 3,
"openie.triple.strict":"true",
'openie.max_entailments_per_clause': 1,
'splitter.disable': True
}
with StanfordOpenIE(properties=properties) as client:
text = 'Barack Obama was born in Hawaii. Richard Manning wrote this sentence.'
print('Text: %s.' % text)
for triple in client.annotate(text): #, max_entailments_per_clause=True):
print('|-', triple)
But the results still contain non-merged duplicating variations:
|- {'subject': 'Barack Obama', 'relation': 'was', 'object': 'born'}
|- {'subject': 'Barack Obama', 'relation': 'was born in', 'object': 'Hawaii'}
Whereas I'm only looking for the maximal clause extraction results:
|- {'subject': 'Barack Obama', 'relation': 'was born in', 'object': 'Hawaii'}
Can someone help me out on this please?
Solution 1:[1]
This code worked for me.
from pycorenlp import *
import json
import nltk
nlp = StanfordCoreNLP("http://localhost:9000/")
text = 'Barack Obama was born in Hawaii. Richard Manning wrote this sentence.'
props = {"annotators": "tokenize,ssplit,pos,depparse,natlog,openie",
"outputFormat": "json",
"openie.triple.strict": "true",
"openie.max_entailments_per_clause": "1"}
sentences = nltk.sent_tokenize(text)
for sent in sentences:
print(sent)
output = nlp.annotate(sent, properties=props)
j_data = json.loads(output)
openie = j_data['sentences'][0]['openie']
for i in openie:
for rel in i:
relationSen = i['subject'], i['relation'], i['object']
print(relationSen)
it produces the following output...
Barack Obama was born in Hawaii.
('Barack Obama', 'was born in', 'Hawaii')
Richard Manning wrote this sentence.
('Richard Manning', 'wrote', 'sentence')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Anthony DiDonato |