'Are there pretrained embeddings for sub word tokenizers like sentencepiece or wordpiece?
So I've been trying to figure out a way to get pretrained embeddings for subword tokenizers like sentencepiece or wordpiece but have been unsuccessful. Do pretrained embeddings for these exist? Is there a library that can take a corpus and produce subword embedding for any given sentence.
I have this conjecture that using a subword tokenizer would work a lot better than traditional tokenizers for my task, but I am unable to understand how to transform the subword tokens to embeddings. I don't want to use a traditional BERT architecture due to its massive size and hence looking for an alternative.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
