'Is there any faster way to get word embeddings given sub-word embeddings in BERT
Using bert.tokenizer I can get the subword ids and the word spans of words in a sentence, for example, given the sentence "This is an example", I get the encoded_text embeddings of ["th","##is","an","exam","##ple"],and the word_spans list: [[0,2],[2,3],[3,5]] My implements is
word_embeddings = torch.rand(len(word_spans),768).to(torch.device('cuda'))
for seq,word in enumerate(word_spans):
word_embeddings[seq,:] = torch.mean(encoded_text[word[0]:word[1],:],0,True)
is there any faster way to combine the vectors of all subwords of the same word in pytorch?
Solution 1:[1]
I use the Flair library and it solves my problems
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | lalalla_schnee |