'I keep getting a path error when attempting to conduct an LDA with the Mallet package
I'm attempting to conduct a Latent Dirichlet Allocation on a list of text within an Excel sheet.
I access the mallet package with the following code:
!wget http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
!unzip mallet-2.0.8.zip
I also designate the environment and path for mallet with the following code:
os.environ['MALLET_HOME'] = 'C:/Users/btmin/Downloads/mallet-2.0.8 (1)'
mallet_path = 'C:/Users/btmin/Downloads/mallet-2.0.8 (1)/bin/mallet.bat'
Finally, I try to evaluate which n-topic model offers the best coherence score as follows:
for number_of_topics in range(2,15):
print(number_of_topics)
ldamallet30 = LdaMallet(mallet_path, corpus=doc_term_matrix, num_topics=number_of_topics, id2word=dictionary)
gensimmodel30 = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldamallet30)
coherencemodel = CoherenceModel(model=gensimmodel30, texts=doc_clean, dictionary=dictionary, coherence='c_v') #try different coherences
coherence_scores[number_of_topics]=coherencemodel.get_coherence()
Unfortunately, I get the following error:
2
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
<ipython-input-84-a24b98543e01> in <module>()
2 for number_of_topics in range(2,15):
3 print(number_of_topics)
----> 4 ldamallet30 = LdaMallet(mallet_path, corpus=doc_term_matrix, num_topics=number_of_topics, id2word=dictionary)
5 gensimmodel30 = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldamallet30)
6 coherencemodel = CoherenceModel(model=gensimmodel30, texts=doc_clean, dictionary=dictionary, coherence='c_v') #try different coherences
3 frames
/usr/local/lib/python3.7/dist-packages/gensim/utils.py in check_output(stdout, *popenargs, **kwargs)
1930 error = subprocess.CalledProcessError(retcode, cmd)
1931 error.output = output
-> 1932 raise error
1933 return output
1934 except KeyboardInterrupt:
CalledProcessError: Command 'C:/Users/btmin/Downloads/mallet-2.0.8 (1)/bin/mallet.bat import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input /tmp/772bcd_corpus.txt --output /tmp/772bcd_corpus.mallet' returned non-zero exit status 2.
Any ideas? I've seen some suggestions from similar errors that the path is designated incorrectly, but I'm unclear on how it's incorrect.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
