'Looping through each row in array to calculate cosine similarity

I have a subset of a dataframe that looks like:

<OUT>
PageNumber   english_only_tags
   175       flower architecture people
   162       hair red bobbles sweets flower
   576       sweets chocolate shop people

I have transformed each row in the english_only_tags df column into a vector via TF-IDF and am calculating cosine similarity to compare each vector with every other vector in the corpus. I am using Sci-Kit for all this.

The intended output is:

<OUT> (made-up numbers)

0   0   1    2    3    ...
1 0.45 0.34 0.76  0.21
2 0.32 0.65 0.71  0.31
3 0.44 0.34 0.72  0.65
...

*where 0-3 are the vectors and the numbers within the matrix the cosine similarity values.*

I am trying to write a function to do this with the hope of attaching all outputs into an exportable CSV. I have this so far, but I am unsure how to loop through each row in the array and calculate cosine similarity for one vector to all other vectors:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import linear_kernel

def cos_similarity(vector):
    '''
    Calculate cosine/semantic similarity between one vector/row with every other vector in the array. Append the 
    values to a dataframe and export to csv.
    '''
    counter = 1
    for i in vector:
        cosine_similarities = linear_kernel(Vectors[:], Vectors).flatten()
        list = cosine_similarities.tolist() #convert array to list
        df = pd.DataFrame(list) #convert list to dataframe
        #return concatenated dataframe with all cosine similarity calculations for every vector
        matrix = pd.concat(pd.DataFrame(df), axis=1)
        counter += 1 #loop through function to next row in array
    return matrix.to_csv("Semantic_similarity_matrix.csv", mode='a', index = False, header=False)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Looping through each row in array to calculate cosine similarity

Sources

Related Questions