'Movie Recommendation: Generate similar movies for all movies
I'm using scikit-learn TfldfVectorizer to produce the TF-IDF matrix
#Import TfIdfVectorizer from scikit-learn
from sklearn.feature_extraction.text import TfidfVectorizer
#Define a TF-IDF Vectorizer Object. Remove all english stop words such as 'the', 'a'
tfidf = TfidfVectorizer(stop_words='english')
#Replace NaN with an empty string
df3['genres'] = df3['genres'].dropna()
#Construct the required TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(df3['genres'])
#Output the shape of tfidf_matrix
tfidf_matrix.shape
Output: (62423, 23). <-shape
# Function that takes in movie title as input and outputs most similar movies
def get_recommendations(title, cosine_sim=cosine_sim):
    # Get the index of the movie that matches the title
    idx = indices[title]
    # Get the pairwsie similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))
    #print(sim_scores)
    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse = True)
    
    # Get the scores of the 10 most similar movies
    sim_scores = (sim_scores[1:5])
    
    #print(sim_scores)
    #sim_scores.dtypes
    # Get the movie indices
    movie_indices = [i[0] for i in sim_scores]
    
    #newcol.append(df3['title'].iloc[movie_indices])
    #df['newcol'] = newcol
    
    # Return the top 10 most similar movies
    #return df2['NEW'] = movie_indces.toarray().tolist() 
    return df3['title'].iloc[movie_indices]
get_recommendations('Toy Story (1995)')
I get the Output:
2203                                       Antz (1998)
3021                                Toy Story 2 (1999)
3653    Adventures of Rocky and Bullwinkle, The (2000)
3912                  Emperor's New Groove, The (2000)
Name: title, dtype: object
However, I want to see recommendations for all movies based on the genre similarity index, and store the results into to an array column. Like so:
frame = pd.DataFrame( columns = ['recommended', 'newcol'])
result = []
movie = []
k=0
for i in df3['title']:
    
        recommendations = get_recommendations(i)    
        result.append(recommendations)
        movie.append(i)
        k+=1
        if k==3:
                break 
frame['recommended'] = result
frame['newcol'] = movie 
frame.head()
Output:

The for loop, works for 3 k iterations. However, if I try 1,000, I get
frame = pd.DataFrame( columns = ['recommended', 'newcol'])
    result = []
    movie = []
    k=0
    
    for i in df3['title']:
        
            recommendations = get_recommendations(i)    
            result.append(recommendations)
            movie.append(i)
   
            k+=1
            if k==1000:
                    break 
    frame['recommended'] = result
    frame['newcol'] = movie 
Output ERROR:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/rh/nqts7dj17knc17lldph1ctmc0000gn/T/ipykernel_30140/4186910855.py in <module>
     10     #while k<2:
     11     #pd.notna(df2['title']):
---> 12         recommendations = get_recommendations(i)
     13         result.append(recommendations)
     14         movie.append(i)
/var/folders/rh/nqts7dj17knc17lldph1ctmc0000gn/T/ipykernel_30140/2131963331.py in get_recommendations(title, cosine_sim)
      8     #print(sim_scores)
      9     # Sort the movies based on the similarity scores
---> 10     sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse = True)
     11 
     12     # Get the scores of the 10 most similar movies
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I look forward to seeing suggestions.
For reference, I'm using python 3.9.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|
