'Is there any faster way to get word embeddings given sub-word embeddings in BERT

Using bert.tokenizer I can get the subword ids and the word spans of words in a sentence, for example, given the sentence "This is an example", I get the encoded_text embeddings of ["th","##is","an","exam","##ple"],and the word_spans list: [[0,2],[2,3],[3,5]] My implements is

word_embeddings = torch.rand(len(word_spans),768).to(torch.device('cuda'))
for seq,word in enumerate(word_spans):
    word_embeddings[seq,:] = torch.mean(encoded_text[word[0]:word[1],:],0,True)

is there any faster way to combine the vectors of all subwords of the same word in pytorch?

Solution 1:^[1]

I use the Flair library and it solves my problems

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	lalalla_schnee

'Is there any faster way to get word embeddings given sub-word embeddings in BERT

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]