'Regular expression in python to extract noun phrase but exclude the adjective word like 'other'

This is code I define a noun phrase chunking method

def np_chunking(sentence):
    import nltk
    from nltk import word_tokenize,pos_tag, ne_chunk
    from nltk import Tree
    grammer = "NP: {<JJ>*<NN.*>+}\n{<NN.*>+}"  # chunker rules. adjective+noun or one or more nouns
    sen=sentence
    cp=nltk.RegexpParser(grammer)
    mychunk=cp.parse(pos_tag(word_tokenize(sen)))
    result=mychunk
    return result.draw()

It works like this

print(np_chunking("""I like to listen to music from musical genres,such as blues,rock and jazz."""))

But when I change the text into another sentence like

print(np_chunking("""He likes to play basketball,football and other sports."""))

I do want to extract noun phrase chunking with structure like adjective plus noun or mutiple nouns. But in the second example, the word other is in the sutructure of 'np_1, np_2 and other np_3'. After the 'and other' it often comes up with a hypernym. In the second part

def hyponym_extract(prepared_text, hearst_patterns):
    text=merge_NP(prepared_text)
    hyponyms=[]
    result=[]
    if re.search(hearst_patterns[0][0],text)!=None:
        matches=re.search(hearst_patterns[0][0],text)
        NP_match=re.findall(r"NP_\w+",matches.group(0))
        hyponyms=NP_match[1:]
        result=[(NP_match[0],x) for x in hyponyms]
    if re.search(hearst_patterns[1][0],text)!=None:
        matches=re.search(hearst_patterns[1][0],text)
        NP_match=re.findall(r"NP_\w+",matches.group(0))
        hyponyms=NP_match[:-1]
        result=[(NP_match[-1],x) for x in hyponyms]
    return result
hearst_patterns = [("(NP_\w+ (, )?such as (NP_\w+ ?(, )?(and |or )?)+)", "first"),
                       ("((NP_\w+ ?(, )?)+(and |or )?other NP_\w+)","last")]  # two examples for hearst pattern
print(hyponym_extract(prepare_chunks(np_chunking("I like to listen to music from musical genres,such as blues,rock and jazz.")),hearst_patterns))
print(hyponym_extract(prepare_chunks(np_chunking("He likes to play basketball,football and other sports.")),hearst_patterns))

The other is a part of the hearst pattern to extract hypernym and hyponyms. So how could I improve my first code to let the second one work correctly?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Regular expression in python to extract noun phrase but exclude the adjective word like 'other'

Sources

Related Questions