'LDA: topic model gensim gives same set of topics

Why am I getting same set of topics # words in gensim lda model? I used these parameters. I checked there are no duplicate documents in my corpus.

lda_model = gensim.models.ldamodel.LdaModel(corpus=MY_CORPUS,
                                           id2word=WORD_AND_ID,
                                           num_topics=4, 
                                           minimum_probability=minimum_probability,
                                           random_state=100,
                                           update_every=1,
                                           chunksize=100,
                                           passes=10,
                                           alpha='auto', # symmetric, asymmetric
                                           per_word_topics=True)

Results

[
(0, '0.004*lily + 0.01*rose + 0.00*jasmine'),
(1, '0.005*geometry + 0.07*algebra + 0.01*calculation'),
(2, '0.003*painting + 0.001*brush + 0.01*colors'),
(3, '0.005*geometry + 0.07*algebra + 0.01*calculation')
]

Notice: Topic #1 and #3 are identical.

Solution 1:^[1]

Each of the topics likely contains a large number of words weighted differently. When a topic is being displayed (e.g. using lda_model.show_topics()) you are going to get only a few words with the largest weights. This does not mean that there are no differences between topics among the remaining vocabulary.

You can steer the number of displayed words to inspect the remaining weights:

 show_topics(num_topics=4, num_words=10, log=False, formatted=True)

and change num_words parameter to include even more words.

Now, there is also a possibility that:

the number of topics should be different (e.g. 3),
or minimum_probability smaller (what is the value you use?),
or number of passes larger,
chunksize smaller,
corpus larger (what is the size?) or stripped off of stop words (did you do that?).

I encourage you to experiment with different values of these parameters to check if any of the combination works better.

Solution 2:^[2]

you need to change the alpha parameter to 50/i which i is your topics number and use the eta parameter. (eta = 0.1)

like this code :

lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,
                                   id2word=id2word,
                                   num_topics=4, 
                                   update_every=1,
                                   chunksize=100,
                                   passes=10,
                                   alpha=50/4,
                                   eta = 0.1,     
                                   per_word_topics=True)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	sophros
Solution 2	yoones_khosravi

'LDA: topic model gensim gives same set of topics

Results

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]