'How to inject training examples in gensim Word2Vec?

I'm using gensim to create a Word2Vec model. I'm wondering if there is a way to feed the gensim class Word2Vec with my examples [(target, context1), (target, context2), ...] instead of feeding it with sentences.

Thanks in advance



Solution 1:[1]

The Gensim Word2Vec class expects a re-iterable sequence where each item is a list of string word tokens. It then does the construction of the inner 'micro-examples' (context-word -> target-word in skip-gram, or context-window -> target-window in CBOW) itself.

There's not an alternate interface, or easy extension-hook, for changing the micro-examples. (Though, as the source code is available, it's possible even when not easy to change it arbitrarily.)

If you only ever need single-word contexts to single-word targets, and are OK with (as in standard word2vec) every A B pair to imply both an A -> B prediction and a B -> A prediction, you may be able to approximate your desired effect by the proper preprocessing of your corpus, completely outside Word2Vec code.

Specifically, only ever provide 2-word texts, of exactly the word pairs you want trained, as if they were full texts.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 gojomo