'Iterate over spacy tokens and extract the BILOU tags
How should I annotate the following sentence with BILOU tags?
I have a function called get_dataset2 what this function do is it will give the tokens, POS tags and BILOU tags but the things is that am stuck at BILOU tags.
Function:
def get_dataset2(sent):
head_entity = ""
candidate_entity = ""
prv_tok_dep = ""
prv_tok_text = ""
prefix = ""
words_ = []
label_ = []
tags_ = []
doc = nlp(sent)
for tok in doc:
words_.append(tok.text)
label_.append(tok.pos_)
if(tok.text=='JUDGMENT'):
tags_.append('O')
next_token1 = doc[tok.i+1]
#next_tok_loc1 = tok.i+1
next_token2 = doc[tok.i+2]
#next_tok_loc2 = tok.i+2
if(tok.text==next_token1 and (next_token2.pos_=='PUNCT' or next_token2.pos_=='NUM')):
tags_.append('U-Parties')
#if(next_token1.pos_=='PROPN' and next_token2.pos_=='PROPN'):
#tags_.append('U-Parties')
else:
tags_.append('O')
return (pd.DataFrame({'Token': words_, 'POS': label_,'Tags': tags_}))
Problem: get_dataset2('JUDGMENT Gajendragadkar, J. 1.') when i pass this sentence to that function then it will successfully extract the tokens and POS but not the BILOU tags.
It should be like :
Tokens POS BILOU Tags
JUDGMENT PROPN O
Gajendragadkar PROPN U-Parties
, PUNCT O
I wan to iterate over tokens like after JUDGMENT I want to identify the second and third token and then I will assign the BILOU tags if it is single then U-parties.
Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
