'What is right way to sum up word2vec vectors generated by Gensim?
I got four 300-dimention word2vec vectors like:
v1=model.wv.get_vector('A')
v2=model.wv.get_vector('B')
v3=model.wv.get_vector('C')
v4=model.wv.get_vector('D')
I want to compare cosine similarity of v1+v2 and v3+v4.
Should I reduce them two 2-dimention vectors first or not?
What numpy function should I use?
Solution 1:[1]
You can add the vectors with simple Python math operators:
va = v1 + v2
vb = v3 + v4
numpy actually doesn't have a cosine-similarity (or cosine-distance) function, so you'd have to use the formula for calculating from the dot-product & unit-norm (both of which numpy has:
cossim = np.dot(va, vb) / (np.linalg.norm(va) * np.linalg.norm(vb))
Or, you could leverage the cosine-distance function in scipy, and convert it to cosine-similarity by subtracting it from 1:
cosdist = scipy.spatial.distance.cosine(va, vb)
cossim = 1 - cosdist
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | gojomo |
