'tagging words with different lengths in order

Hi i am trying to tag the words in a sentence in order. For example, (my initial method)

Sentence: Work across a wide range of related areas
Label:    Tag    O    O O    O     O  Tag     Tag

But now i need it to be like this where it can tag 2 words as a keyword aand label it together:

Sentence: Work across a wide range of related areas
Label:    Tag    O    O O    O     O  Tag     

I have a list of keyword of varying length and their tags. How can i tag the way i need it to be in the sentence order?



Solution 1:[1]

Looks like what you are looking for is the BIO-tagging system (If I understood you correctly, and you are looking for a solution in manually tagged corpora).

BIO denotes the following: B - beginning of a chunk, I - the inside of the chunk, O - is a token outside of a chunk.

Step 1

Sentence: Work across a wide range of related areas
Tag:       B     O    O   O    O    O   B        I
Label:  Label_1  O    O   O    O    O   Label_2  Label_2 

Step 2

Sentence: Work across a wide range of related areas
Label:  B-Label_1  O    O   O    O    O   B-Label_2  I-Label_2 

Once you have tagged your corpus, you will align the lists of Sentences (list #1) and Tag + Label combos (list #2): the BIO tags will be prefixed to your labels, e.g., [...related, areas] + [... B-Label_2, I-Label_2]. That way you can combine [B-Label_2, I-Label_2] into one Label_2 since you have a pattern of BI together. You will just have to strip the prefixes at the very end and do a lot of other intermediate steps and post-processing.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1