'debugging an implementation of Viterbi algorithm for HMM

I'm having a bit of trouble trying to find out what's wrong with my implementation of the Viterbi algorithm. I'm trying to use this for NER (Named Entity Recognition). The emission_log_probs, transition_log_probs and initial_log_probs matrices are the Emission (SxV), Transition(SxS) and Initial state (S) matrixes respectively, containing log probabilities. Here, S is the number of states, V is the number of words in the vocabulary. The words (tokens) are indexed in a word_indexer to simply map them from string->int. :

    N = len(sentence_tokens) # number of tokens
    S = len(self.tag_indexer) # number of states
    scores = np.empty((N, S), 'd') # the log probabilities of the most like path so far
    backtrace = np.empty((N, S), 'B') # the states that correspond to the max at each step

    # initialize
    first_word = self.word_indexer.index_of(sentence_tokens[0].word) # get index of first word
    scores[0, :] = self.init_log_probs + self.emission_log_probs[:, first_word]
    backtrace[0, :] = 0

    # recurrence
    for i in range(1, N):
        word = self.word_indexer.index_of(sentence_tokens[i].word)
        for s in range(0, S):
            scores[i, s] = self.emission_log_probs[s, word] + np.max(self.transition_log_probs[:, s] + scores[i-1,:])
            backtrace[i, s] = np.argmax(self.transition_log_probs[:, s] + scores[i-1,:])

    # build sequence
    x = np.empty(N, 'B')
    x[-1] = np.argmax(scores[N-1, :])
    for i in reversed(range(1, N)):
        x[i - 1] = backtrace[i, x[i]]

    # x is final sequence of states

It seems that I should be achieving a much higher accuracy on the same dataset (as my instructor suggested), however, I'm a bit stuck on where I'm going wrong. Does anyone see anything immediately concerning?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source