'What is right way to sum up word2vec vectors generated by Gensim?

I got four 300-dimention word2vec vectors like:

v1=model.wv.get_vector('A')
v2=model.wv.get_vector('B')
v3=model.wv.get_vector('C')
v4=model.wv.get_vector('D')

I want to compare cosine similarity of v1+v2 and v3+v4.

Should I reduce them two 2-dimention vectors first or not?

What numpy function should I use?



Solution 1:[1]

You can add the vectors with simple Python math operators:

va = v1 + v2
vb = v3 + v4

numpy actually doesn't have a cosine-similarity (or cosine-distance) function, so you'd have to use the formula for calculating from the dot-product & unit-norm (both of which numpy has:

cossim = np.dot(va, vb) / (np.linalg.norm(va) * np.linalg.norm(vb))

Or, you could leverage the cosine-distance function in scipy, and convert it to cosine-similarity by subtracting it from 1:

cosdist = scipy.spatial.distance.cosine(va, vb)
cossim = 1 - cosdist

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 gojomo