'return Cosine Similarity not as single value
How can I make a pure NumPy function that will return an array of the shape of the 2 arrays with the cosine similarities of all the pairwise comparisons of the rows of the input array?
I don't want to return a single value.
dataSet1 = [5, 6, 7, 2]
dataSet2 = [2, 3, 1, 15]
def cosine_similarity(list1, list2):
# How to?
pass
print(cosine_similarity(dataSet1, dataSet2))
Solution 1:[1]
You can use scipy for this as stated in this answer.
from scipy import spatial
dataSet1 = [5, 6, 7, 2]
dataSet2 = [2, 3, 1, 15]
result = 1 - spatial.distance.cosine(dataSet1, dataSet2)
Solution 2:[2]
You can also use the cosine_similarity function from sklearn.
from sklearn.feature_extraction.text import CountVectorizer ##if the documents are text
from sklearn.metrics.pairwise import cosine_similarity
def cos(docs):
if len(docs)==1:
return []
cos_final = []
count_vectorizer= CountVectorizer(tokenizer=tokenize)
doc1= ['missing' if x is np.nan else x for x in docs]
count_vec=count_vectorizer.fit_transform(doc1)
#print(count_vec)
cosine_sim_matrix= cosine_similarity(count_vec)
#print(cosine_sim_matrix)
return cosine_sim_matrix
Solution 3:[3]
What you are searching for is cosine_similarity from sklearn library.
Here is a simple example:
Lets we have x which has 5 dimensional 3 vectors and y which has only 1 vector. We can compute cosine similarity as follows:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
x = np.random.rand(3,5)
y = np.random.rand(1,5)
# >>> x
# array([[0.21668023, 0.05705532, 0.6391782 , 0.97990692, 0.90601101],
# [0.82725409, 0.30221347, 0.98101159, 0.13982621, 0.88490538],
# [0.09895812, 0.19948788, 0.12710054, 0.61409403, 0.56001643]])
# >>> y
# array([[0.70531146, 0.10222257, 0.6027328 , 0.87662291, 0.27053804]])
cosine_similarity(x, y)
Then the output is the cosine similarity of each vector from x (3) with y (1) so the output has 3x1 values:
array([[0.84139047],
[0.75146312],
[0.75255157]])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Helge Schneider |
| Solution 2 | Will A |
| Solution 3 | Ersel Er |
