'Iterate over spacy tokens and extract the BILOU tags

How should I annotate the following sentence with BILOU tags?

I have a function called get_dataset2 what this function do is it will give the tokens, POS tags and BILOU tags but the things is that am stuck at BILOU tags.

Function:

def get_dataset2(sent):
  head_entity = ""
  candidate_entity = ""

  prv_tok_dep = ""    
  prv_tok_text = ""  

  prefix = ""
  words_ = []
  label_ = []
  tags_ = []

  doc = nlp(sent) 
  
  for tok in doc:
      words_.append(tok.text)
      label_.append(tok.pos_)

      if(tok.text=='JUDGMENT'):
          tags_.append('O')
          next_token1 = doc[tok.i+1]
          #next_tok_loc1 = tok.i+1
          next_token2 = doc[tok.i+2]
          #next_tok_loc2 = tok.i+2

      if(tok.text==next_token1 and (next_token2.pos_=='PUNCT' or next_token2.pos_=='NUM')):
          tags_.append('U-Parties')


      #if(next_token1.pos_=='PROPN' and next_token2.pos_=='PROPN'):
          #tags_.append('U-Parties')

      else:
          tags_.append('O')    

  return (pd.DataFrame({'Token': words_, 'POS': label_,'Tags': tags_}))

Problem: get_dataset2('JUDGMENT Gajendragadkar, J. 1.') when i pass this sentence to that function then it will successfully extract the tokens and POS but not the BILOU tags.

It should be like :

Tokens         POS       BILOU Tags
JUDGMENT       PROPN     O
Gajendragadkar PROPN     U-Parties
,              PUNCT     O

I wan to iterate over tokens like after JUDGMENT I want to identify the second and third token and then I will assign the BILOU tags if it is single then U-parties.

Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source