'Using word2vec vectors to train random forest

I am working on sentiment analysis and one of my feature is to generate word embeddings using word2vec.

The dimensions i am using are 350 so i am getting an array of 350 values for each word.

  1. I am planning to take the average value and use 1 value as a vector

  2. Storing the values as plain values for example :

Review : i am a good boy

Vectors for i 566 6 7 7 for am 66 7 7 7u for a 77777766 for good 6666 566 6 etc

Any help would be greatly appreciated

code

Here is how i solved this

cleanWords=[]
for i in range(0,len(words)):
    cleanWords.append(words[i].strip())

vectorsDict={}
for i in range(0,len(cleanWords)):
    vectorsDict[cleanWords[i]]=model.wv[cleanWords[i]]

vectorized=[]
for i in range(0,len(rTxt)):
    tokens=word_tokenize(rTxt[i])
    for word in tokens:
        for key in vectorsDict:
            if word == key:
                word=vectorsDict[key]
        a = ','.join(str(v) for v in word)
    vectorized.append(a)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source