'calculating average artist entropy given user prediction and tracks in recommender systems
I have to calculate average artist entropy of users. I have solved this task on a test case but I am not able to generalize it to more task cases. Shannon Entropy formula was used calculation the entropy of users.
def get_average_entropy_score(predictions: np.ndarray, item_df: pd.DataFrame, topK=10) -> float:
"""
predictions - np.ndarray - predictions of the recommendation algorithm for each user.
item_df - pd.DataFrame - information about each song with columns 'artist' and 'track'.
returns - float - average entropy score of the predictions.
"""
score = None
# TODO: YOUR IMPLEMENTATION.
l = []
for i in item_df['artist']:
l.append(i)
prob = 0
prob2 = 0
prob3 = 0
prob4 = 0
for i in range(len(predictions)):
for j, v in enumerate(predictions[i]):
if l[v] == 'A1':
p = 1/len(predictions[i])
prob += p
if l[v] == 'A2':
p = 1/len(predictions[i])
prob2 += p
if l[v] == 'A3':
p = 1/len(predictions[i])
prob3 += p
if l[v] == 'A4':
p = 1/len(predictions[i])
prob4 += p
if v != -1:
continue
entro1 = (prob*np.log2(prob))
entro2 = -(prob2*np.log2(prob2) + prob3*np.log2(prob3) + prob4*np.log2(prob4))
add = entro1 + entro2
entropy_over_users = add/4 # number of items/user
score = entropy_over_users
print(entropy_over_users)
return score
Now imagine I have a dataframe of artist - track like the following:
item_df = pd.DataFrame({'artist': ['A1', 'A1', 'A1', 'A1', 'A2', 'A3', 'A4']})
And I have a prediction of recommender system predicting items in position 0 1 2 or 3 like the following:
predictions = np.array([[0, 1, 2, 3], [6, 5, 4, 3], [-1, -1, -1, -1]])
From predictions e.g. the user 1 has been recommended item 0 first, item 1 second, item 2 third and item 3 fourth. A prediction of -1 means I should ignore this value because this item has not been seen by the user and should not be included in to calculation at all.
Now the question is I can't get it to work for general case where for example I don't know the A1, A2 and so on. or better Imagine you don't know the track names. Also see that item 0 in the prediction means that it is the first track in item_df, item 1 means the second and so on. Please help me. I don't know how to progress further! Please ask if something is unclear! Thanks!
Additional remark: solving the test case on paper gave me 0.5 if I normalize it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|