'Using word2vec vectors to train random forest
I am working on sentiment analysis and one of my feature is to generate word embeddings using word2vec.
The dimensions i am using are 350 so i am getting an array of 350 values for each word.
I am planning to take the average value and use 1 value as a vector
Storing the values as plain values for example :
Review : i am a good boy
Vectors for i 566 6 7 7 for am 66 7 7 7u for a 77777766 for good 6666 566 6 etc
Any help would be greatly appreciated
code
Here is how i solved this
cleanWords=[]
for i in range(0,len(words)):
cleanWords.append(words[i].strip())
vectorsDict={}
for i in range(0,len(cleanWords)):
vectorsDict[cleanWords[i]]=model.wv[cleanWords[i]]
vectorized=[]
for i in range(0,len(rTxt)):
tokens=word_tokenize(rTxt[i])
for word in tokens:
for key in vectorsDict:
if word == key:
word=vectorsDict[key]
a = ','.join(str(v) for v in word)
vectorized.append(a)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
