'Predict values with tensorflow recommender system model using context features

I am currently trying to build a recommender system with TensorFlow on my own dataset (user, item, weekday). I have a first version that just uses user-item-interactions as a basis. Now I want to extend it with a context feature (weekday of interaction) like here.

I adapted my model and it trains fine, Tensorflows model.evaluate() also works. As I am trying to compare the results to some self written models, I need to use exactly the same metrics. So I tried to get a prediction for every interaction and then calculate it my way.

This led to problems with the format of the data, as I have to give the user_id as well as the weekday. So I tried going back to the aforementioned example and get results either by using model.predict() or by using tfrs.layers.factorized_top_k.BruteForce() as described for example here.

In the first mentioned notebook, I added the following code at the end:

index = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
index.index_from_dataset(
    tf.data.Dataset.zip((movies.batch(100), 
    movies.batch(100).map(model.candidate_model)))
)
predictions_1 = index(ratings, 10)
predictions_2 = model.predict(cached_test)

BruteForce Way

Trying to get predictions_1 gives me

'CacheDataset' object is not subscriptable

in the call() of the UserModel. I understand that this is caused by trying to access inputs[something] when inputs won't allow accessing via indexing like this. But just don't know what is the correct way to use instead. I tried creating other Dataset (like MapDataset etc.) objects but none of them are subscriptable. Then I tried building up a Tensor and access it with indexing [0, :] for user_id etc. Does not work either because the the Sequential Layer can't handle slices. Converting to numpy does not work either.

model.predict way

Trying to get predictions_2, I implemented the call()-function in the MovieLensModel as described here:

def call(self, inputs):
    query_embeddings = self.query_model({
        "user_id": inputs["user_id"],
        "timestamp": inputs["timestamp"],
    })
    movie_embeddings = self.candidate_model(inputs["movie_title"])

    return tf.matmul(query_embeddings, movie_embeddings, transpose_a=True)

I know that this can not the final or correct way, but see it as a first try. I am not trying to already get the result but some kind of interaction matrix. However, I get a result but it is in the shape of (160, 32). As 32 is the embedding dimension and both users as well as movies are much more (942 and 1425) in the testing data, I don't know how I get 160. Both embedding results have (None, 32) as shape. Thought about batches but then I should have multiple subtensors in the result.

Moreover, I have the problem that the debugger does not step into the named call()-function, but somehow I can print debug from there? So it seems to be executed but I can't go in there?

Questions

  1. Has anyone used the TFRS with context features and found a way to predict item values for a (user, feature)-combination?
  2. Does anyone have any idea which Datatype I can use for the first prediction try?
  3. How is the idea to the second approach wrong? How should it be done?
  4. Are those even good ideas or is this completely a wrong approach cause I'm missing something?

EDIT:

I found out that one of the problems is the feeding of batches to the BruteForce-Layer. If I do the following, I get reasonable results in form of a tensor containing k movie titles and the corresponding ratings:

for batch in cached_test:
    predictions = index(batch, 10)

Nevertheless, this cannot be the preferred way as I get warnings cause I feed a dict to the model.

WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'dict'> input: ...

So this seems like a workaround and still there is the question for the intended way to do this.

Versions

I am running:

  • tensorflow: 2.7.0
  • tensorflow-recommenders: 0.6.0
  • python: 3.8.5


Solution 1:[1]

I was facing the similar issue. I resolved this issue by passing data in dict format through a dataframe

prediction_1 = index(dict(df.loc[i,['user_id','weekday']].map(lambda x: 
                                               tf.expand_dims(x,axis=0))))

Note: If your issue is not resolved yet, share your model code...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 FluxedScript