'Regular expression in python to extract noun phrase but exclude the adjective word like 'other'
This is code I define a noun phrase chunking method
def np_chunking(sentence):
import nltk
from nltk import word_tokenize,pos_tag, ne_chunk
from nltk import Tree
grammer = "NP: {<JJ>*<NN.*>+}\n{<NN.*>+}" # chunker rules. adjective+noun or one or more nouns
sen=sentence
cp=nltk.RegexpParser(grammer)
mychunk=cp.parse(pos_tag(word_tokenize(sen)))
result=mychunk
return result.draw()
It works like this
print(np_chunking("""I like to listen to music from musical genres,such as blues,rock and jazz."""))
But when I change the text into another sentence like
print(np_chunking("""He likes to play basketball,football and other sports."""))
I do want to extract noun phrase chunking with structure like adjective plus noun or mutiple nouns. But in the second example, the word other is in the sutructure of 'np_1, np_2 and other np_3'. After the 'and other' it often comes up with a hypernym.
In the second part
def hyponym_extract(prepared_text, hearst_patterns):
text=merge_NP(prepared_text)
hyponyms=[]
result=[]
if re.search(hearst_patterns[0][0],text)!=None:
matches=re.search(hearst_patterns[0][0],text)
NP_match=re.findall(r"NP_\w+",matches.group(0))
hyponyms=NP_match[1:]
result=[(NP_match[0],x) for x in hyponyms]
if re.search(hearst_patterns[1][0],text)!=None:
matches=re.search(hearst_patterns[1][0],text)
NP_match=re.findall(r"NP_\w+",matches.group(0))
hyponyms=NP_match[:-1]
result=[(NP_match[-1],x) for x in hyponyms]
return result
hearst_patterns = [("(NP_\w+ (, )?such as (NP_\w+ ?(, )?(and |or )?)+)", "first"),
("((NP_\w+ ?(, )?)+(and |or )?other NP_\w+)","last")] # two examples for hearst pattern
print(hyponym_extract(prepare_chunks(np_chunking("I like to listen to music from musical genres,such as blues,rock and jazz.")),hearst_patterns))
print(hyponym_extract(prepare_chunks(np_chunking("He likes to play basketball,football and other sports.")),hearst_patterns))
The other is a part of the hearst pattern to extract hypernym and hyponyms. So how could I improve my first code to let the second one work correctly?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

