'User similarity using Matrix
User->User Similarity or Item->Item Similarity.
For example, User1 (U1) likes Item1, Item2, Item3 User2 (U2) likes Item2, Item1, Item4 User3 (U3) likes Item5, Item6, Item1
Based on the above User Preferences, the system starts suggesting items to different Users. Since, User1 and User2 have quite similar liking, so the system can suggest - you may also like this particular item.
How are these types of problems solved practically? Someone suggested the use of Matrix, like below
From my limited knowledge, I know that matrix is basically a multidimesional array. For this, the entire data needs to be pulled into a temporary buffer. But, is this a rational approach when there are 10 million data records.
Is there any available code just for understanding purpose?
Solution 1:[1]
Yes, I believe that matrices are immensely useful here. Say you have N persons with M items that identify their interests; then the similarity will be quantified by the distance matrix between them. For example:
>>> import numpy as np
>>> persons = np.array((
(1,0,0,1,0,1,0),
(1,0,1,0,0,1,0),
(1,1,0,0,1,0,0),
(1,0,0,1,0,1,0),
))
Thus, M=4 (persons) and N=7 (items). Compute the distance matrix (which is really a norm of the person-wise vector difference, i.e. s_ij = ||p_i-p_j||:
>>> from scipy.spatial import distance
>>> distance.cdist(persons, persons)
array([[0. , 1.41421356, 2. , 0. ],
[1.41421356, 0. , 2. , 1.41421356],
[2. , 2. , 0. , 2. ],
[0. , 1.41421356, 2. , 0. ]])
The diagonal elements will all be 0 because each person is self-identical. Off-diagonal elements tell you how similar a person is to any other. The smaller the value, the more similar the person is.
So yes, matrices are definitely a good way to go.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrej Prsa |

