'Using Topic Modelling or another NLP approach, is it possible to define words that go into topics/categories for better defined topic model?

I have a problem where I am using topic modelling and taking into consideration LDA & LSA approaches however have found that some of the topics are not being defined as accurately as I like. Is it possible to define words into topics to help the allow the machine to learn better and easier? If not, what techniques could I alternatively use to counter this problem?

As previously explained, I have tried LDA and LSA techniques for topic modelling and found LDA to be most accurate giving a coherence score of 0.46, and have redefined the topic names. However, the words in the topics do not reflect the topic names, and this requires tuning of the model.

I have researched into other NLP solutions such as keyword extractions and named entity relationship (NER) but do not think they are suitable for my problem.

I am wanting to have 2 levels of categorization if possible, where level 1 is an overview and level 2 is in more detail. The example below is a loosely summarized customer feedback example:

Level 1

  • Training

  • Communication

  • Technology

  • Products & Services

  • Other

Level 2

  • Internal

  • External

  • Resolution Good

  • Resolution Bad

  • Unclear feedback

Ideally this is the format I would like the topic modelling output to produce but unsure if this is viable?

Realistically, working on the weighting of the text would work. Example:

'Great training from the company' - Would be categorized as Training (Level 1) and Resolution Good (level 2). The words being picked up here are great and training as they outweigh the other words in terms of categorization.

Happy to provide further information if required.



Solution 1:[1]

As you understand, topic modelling is generally an unsupervised technique, so I hardly imagine you can solve your complex problem (2 levels of classification) just using this approach. Perhaps topic modelling could be a first step, which can help you in a subsequent supervised approach.

In any case, if you want to try to provide some words in order to guide the topic modelling task, there are at least two libraries to take a look at:

  • GuidedLDA (a bit old, but maybe coherent with your approach)
  • BERTopic (a breath of fresh air on topic modeling, also implements semi-supervised techniques)

Please share your updates on this task.

Solution 2:[2]

It seems that it is not possible to get multiple levels to answer my questions however a way around this is by running the topic modelling approach twice to get 2 different levels. However, this requires more sueprvision in terms to the definition of the topic outputs and what you are trying to define in each topic.

The technique approach I found useful after extensive research was CorEx -https://github.com/gregversteeg/corex_topic

It allow you to self define the number of topics and more importantly the words you want in each topic. I found that this answered my query to a more supervised approach.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Stefano Fiorucci - anakin87
Solution 2 JordanB