'How to measure the coherence in sklearn LDA model (Non-gensim LDA model)?

I have tried using two techniques, but I am getting different results. I just want to be sure about which one to go with.

Method 1: I tried using

from tmtoolkit.topicmod.evaluate import metric_coherence_gensim
metric_coherence_gensim(measure='u_mass', 
                        top_n=25, 
                        topic_word_distrib = lda.components_, 
                        dtm = dtm, 
                        vocab=np.array([x for x in tfidf_vect.vocabulary_.keys()]),
                        return_mean = True)

The source mentioned that a decent coherence score should be between -14 to +14. Any explanation on this also helps.

Method 2: I had to write functions to calculate the score without any in-built library.

def get_umass_score(dt_matrix, i, j):
    zo_matrix = (dt_matrix > 0).astype(int)
    col_i, col_j = zo_matrix[:, i], zo_matrix[:, j]
    col_ij = col_i + col_j
    col_ij = (col_ij == 2).astype(int)    
    Di, Dij = col_i.sum(), col_ij.sum()    
    return math.log((Dij + 1) / Di)

def get_topic_coherence(dt_matrix, topic, n_top_words):
    indexed_topic = zip(topic, range(0, len(topic)))
    topic_top = sorted(indexed_topic, key=lambda x: 1 - x[0])[0:n_top_words]
    coherence = 0
    for j_index in range(0, len(topic_top)):
        for i_index in range(0, j_index - 1):
            i = topic_top[i_index][1]
            j = topic_top[j_index][1]
            coherence += get_umass_score(dt_matrix, i, j)
    return coherence


def get_average_topic_coherence(dt_matrix, topics, n_top_words):
    total_coherence = 0
    for i in range(0, len(topics)):
        total_coherence += get_topic_coherence(dt_matrix, topics[i], n_top_words)
    return total_coherence / len(topics)

I got this from a StackOverflow post. Credits to that guy who wrote this, but I was getting huge value depending on the value I pass for n_top_words.

Can someone tell me which method is reliable, or is there any better way I can find the coherence score for sklearn LDA models?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source