'I keep getting a path error when attempting to conduct an LDA with the Mallet package

I'm attempting to conduct a Latent Dirichlet Allocation on a list of text within an Excel sheet.

I access the mallet package with the following code:

!wget http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
!unzip mallet-2.0.8.zip

I also designate the environment and path for mallet with the following code:

os.environ['MALLET_HOME'] = 'C:/Users/btmin/Downloads/mallet-2.0.8 (1)'
mallet_path = 'C:/Users/btmin/Downloads/mallet-2.0.8 (1)/bin/mallet.bat'

Finally, I try to evaluate which n-topic model offers the best coherence score as follows:

for number_of_topics in range(2,15):
  print(number_of_topics)
  ldamallet30 = LdaMallet(mallet_path, corpus=doc_term_matrix, num_topics=number_of_topics, id2word=dictionary)
  gensimmodel30 = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldamallet30)
  coherencemodel = CoherenceModel(model=gensimmodel30, texts=doc_clean, dictionary=dictionary, coherence='c_v') #try different coherences
  coherence_scores[number_of_topics]=coherencemodel.get_coherence()

Unfortunately, I get the following error:

2
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-84-a24b98543e01> in <module>()
      2 for number_of_topics in range(2,15):
      3   print(number_of_topics)
----> 4   ldamallet30 = LdaMallet(mallet_path, corpus=doc_term_matrix, num_topics=number_of_topics, id2word=dictionary)
      5   gensimmodel30 = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldamallet30)
      6   coherencemodel = CoherenceModel(model=gensimmodel30, texts=doc_clean, dictionary=dictionary, coherence='c_v') #try different coherences

3 frames
/usr/local/lib/python3.7/dist-packages/gensim/utils.py in check_output(stdout, *popenargs, **kwargs)
   1930             error = subprocess.CalledProcessError(retcode, cmd)
   1931             error.output = output
-> 1932             raise error
   1933         return output
   1934     except KeyboardInterrupt:

CalledProcessError: Command 'C:/Users/btmin/Downloads/mallet-2.0.8 (1)/bin/mallet.bat import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input /tmp/772bcd_corpus.txt --output /tmp/772bcd_corpus.mallet' returned non-zero exit status 2.

Any ideas? I've seen some suggestions from similar errors that the path is designated incorrectly, but I'm unclear on how it's incorrect.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source