'Evaluate topic model output (LDA, LSI and Bertopic) using recall, precision and F1 measure

I trained 3 different topic models using lda and lsi gensim and bertopic. I evaluated the models using only coherence score(c_v metric). I would like to apply classification metrics (recall, precision and F1 score).

I looked for an implementation in python but couldn't find any for the case of topic models output.
since i am dealing with multiclass problem ( many topics), I could use scikit-learn with randomforest classifier (or any other classifier) and generate the confusion matrix that will allow me to retrieve the above mentioned metrics.

as far as I know, I need to have X and y, X = my_text and y = generated_topics (is that right). could you help me know what will be my X and my y as i get confused about that.

I have around 10000 documents and 80 topics ( labeled as topic 0, topic 1, .., topic79) I also find the dominant topic for each document. thank ypou



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source