'Given an item, how do I get recommendation of users who have not rated this item?
My use case:
Given an item, I would like to get recommendations of users who have not rated this item.
I found this amazing Python library that can answer my use case:
python-recsys https://github.com/ocelma/python-recsys
The example is given as below.
Which users should see Toy Story? (e.g. which users -that have not rated Toy Story- would give it a high rating?)
svd.recommend(ITEMID)
# Returns: <USERID, Predicted Rating>
[(283, 5.716264440514446),
(3604, 5.6471765418323141),
(5056, 5.6218800339214496),
(446, 5.5707524860615738),
(3902, 5.5494529168484652),
(4634, 5.51643364021289),
(3324, 5.5138903299082802),
(4801, 5.4947999354188548),
(1131, 5.4941438045650068),
(2339, 5.4916048051511659)]
This implementation used SVD to predict ratings given by users, and return the user id of the highest rating user-movie which were initially not rated.
Unfortunately, this library is written using Python 2.7, which is not compatible with my project.
I also found the Scikit Surprise library which has a similar example.
import io # needed because of weird encoding of u.item file
from surprise import KNNBaseline
from surprise import Dataset
from surprise import get_dataset_dir
def read_item_names():
"""Read the u.item file from MovieLens 100-k dataset and return two
mappings to convert raw ids into movie names and movie names into raw ids.
"""
file_name = get_dataset_dir() + '/ml-100k/ml-100k/u.item'
rid_to_name = {}
name_to_rid = {}
with io.open(file_name, 'r', encoding='ISO-8859-1') as f:
for line in f:
line = line.split('|')
rid_to_name[line[0]] = line[1]
name_to_rid[line[1]] = line[0]
return rid_to_name, name_to_rid
# First, train the algortihm to compute the similarities between items
data = Dataset.load_builtin('ml-100k')
trainset = data.build_full_trainset()
sim_options = {'name': 'pearson_baseline', 'user_based': False}
algo = KNNBaseline(sim_options=sim_options)
algo.fit(trainset)
# Read the mappings raw id <-> movie name
rid_to_name, name_to_rid = read_item_names()
# Retrieve inner id of the movie Toy Story
toy_story_raw_id = name_to_rid['Toy Story (1995)']
toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)
# Retrieve inner ids of the nearest neighbors of Toy Story.
toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k=10)
# Convert inner ids of the neighbors into names.
toy_story_neighbors = (algo.trainset.to_raw_iid(inner_id)
for inner_id in toy_story_neighbors)
toy_story_neighbors = (rid_to_name[rid]
for rid in toy_story_neighbors)
print()
print('The 10 nearest neighbors of Toy Story are:')
for movie in toy_story_neighbors:
print(movie)
Prints
The 10 nearest neighbors of Toy Story are:
Beauty and the Beast (1991)
Raiders of the Lost Ark (1981)
That Thing You Do! (1996)
Lion King, The (1994)
Craft, The (1996)
Liar Liar (1997)
Aladdin (1992)
Cool Hand Luke (1967)
Winnie the Pooh and the Blustery Day (1968)
Indiana Jones and the Last Crusade (1989)
How do I change the code to get the outcome like the python-recsys's example above? Thanks in advance.
Solution 1:[1]
This is just an implementation of the k-nearest neighbors algorithm. Take a look at how it works before you continue.
What's happening is the second chunk of code you posted is just classifying movies based on some metrics. The first bit is (probably) taking the already seen movies and matching it up against all the existing classes. From there, it's computing a similarity score and returning the highest.
So you take Beauty and the Beast. That's been classified as a children's cartoon. You compare the watched movies of your users to the full set of movies and take the x highest users with a score that indicates a high similarity between the set of movies that Beauty and the Beast falls into and the user's previously watched movies, but also where Beauty and the Beast is unwatched.
This is the math behind the algorithm https://youtu.be/4ObVzTuFivY
Solution 2:[2]
i am not sure if its too late to answer but i too wanted to try the same thing and got this workaround from the surprise package, not sure if this is the right approach though,
movieid = 1
# get the list of the user ids
unique_ids = ratingSA['userID'].unique()
# get the list of the ids that the movieid has been watched
iids1001 = ratingSA.loc[ratingSA['item']==movieid, 'userID']
# remove the rated users for the recommendations
users_to_predict = np.setdiff1d(unique_ids,iids1001)
# predicting for movie 1
algo = KNNBaseline(n_epochs = training_parameters['n_epochs'], lr_all = training_parameters['lr_all'], reg_all = training_parameters['reg_all'])
algo.fit(trainset)
my_recs = []
for iid in users_to_predict:
my_recs.append((iid, algo.predict(uid=userid,iid=iid).est))
recomend=pd.DataFrame(my_recs, columns=['iid', 'predictions']).sort_values('predictions', ascending=False).head(5)
recomend= recomend.rename({'iid':'userId'},axis=1)
recomend
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Zachary Brasseaux |
| Solution 2 |
