'keras embedding shape one bigger than max user_id/item_id?
I was playing around with some code to generate a recommender system using collaborative filtering with a dot product. What I realised is that when you then get the weights the shape of the embeddings is one more than the maximum user_id/item id? It seems like the ith embedding belongs to the ith user_id/item_id but what is the 0th embedding then? Why is there an additional embedding?
here is the code
users = ratings.userId.unique()
items = ratings.movieId.unique()
user_id_input = Input(shape=[1], name='user')
item_id_input = Input(shape=[1], name='item')
embedding_size = 64
user_embedding = Embedding(output_dim=embedding_size,
input_dim=users.shape[0]+1,
input_length=1,
name='user_embedding')(user_id_input)
item_embedding = Embedding(output_dim=embedding_size,
input_dim=items.shape[0]+1,
input_length=1,
name='item_embedding')(item_id_input)
user_vecs = Reshape([embedding_size])(user_embedding)
item_vecs = Reshape([embedding_size])(item_embedding)
y = Dot(1, normalize=False)([user_vecs, item_vecs])
model = Model(inputs=[user_id_input, item_id_input], outputs=y)
model.compile(loss='mae',
optimizer="adam"
)
weights = model.get_weights()
#this now gives the max user_id/item_id plus 1
print("weights shapes",[w.shape for w in weights])
I used the movielens dataset and in this particular dataset the number of unique users is 610 and the number of unique items is 9724 but the shape of the weights is [(611, 64), (9725, 64)], so 611 and 9725. Why?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
