'Output 2D array to a Matrix as a CSV - Python

I have a 2D array with vectorised rows with each row representing a document in the corpus:

array[[ 0.0 0.0 0.4583 0.6584 0.0]
                              ...
      [0.4390 0.0 0.0 0.5749 0.0]]

I have calculated cosine similarity for each row/vector in the 2D array with every other vector like so:

#calculate semantic similarity for all permutations all in one go
for i in range(Vectors.shape[0]): #for each vector/row in 2D array
    for j in range(i + 1, Vectors.shape[0]): #for each row + 1 in the 2D array
        cosine_similarities = linear_kernel(Vectors[i], Vectors[j]).flatten()
        #np.savetxt("foo.csv", cosine_similarities, delimiter=",")
        pd.DataFrame(cosine_similarities).to_csv("test_matrix.csv", mode = 'a') #save into csv as a matirix

The output prior to saving into a csv looks like:

[0.5748389]
[0.5847379]
...
[0.3257490]

How am I able to transform the output into a matrix and save that into a csv?

The output I'm looking for is:

   0          1           ...  76
0  0.5748389  0.5847379        0.3257490
1  ...        ...         ...   ...
...
76

UPDATE: I followed this and it worked out! Using cosine similarity function directly on a sparse matrix worked, and then converted it to a list and then dataframe. See: What's the fastest way in Python to calculate cosine similarity given sparse matrix data? for more info!



Solution 1:[1]

if you cosine_similarities.shape is

(77, 77)

then try this

df=pd.DataFrame(cosine_similarities, columns=[i for i in range(0,77)], index=[i for i in range(0,77)])
df.to_csv('yourcsv.csv')

if you don't need the index as a separate column, then change this

df.to_csv('yourcsv.csv', index = False)

Hope this helps!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Thirunaavukkarasu M