'Coherence scores (u_mass) for LDA models very volatile when varying the number of topics

Why does coherence vary so much as number of topics change?

I am using Gensim's coherence model to calculate u_mass coherence scores for a variety of Latent Dirichlet Allocation (LDA) topic models which vary the number of topics (k). My aim is to optimise k.

I understand that in general, the higher the u_mass coherence score the better, so k should be selected where coherence is maximised (according to sources here, here and here). However, I find when I plot the coherence scores the graph is highly volatile. This is also reflected in other examples I've found online:

My graph

Another example

Another example

Can someone explain why the coherence varies so much as k changes? If I understand it correctly - if k increases and the new topic has words which have low co-occurrence, the global coherence score falls significantly. If k changes again but the new words grouped by the topic have high co-occurrence, the global coherence score increases. This is why we see so much variation in coherence to changing k.

A small additional question; is k optimised where we see local peaks in coherence?

Any help or additional information is greatly appreciated. Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source