'Latent Dirichlet Allocation Evaluator
I am using the Apache Spark MlLib implementation of Latent Dirichlet Allocation. I'd like to do some tuning on some parameters like TopicConcentration, DocumentConcentration, number of features for HashTF etc. I'd like also to use the CrossValidator provided by MlLib but I was not able to find an Evaluator for logPerplexity. Can anyone help me? Thank you!
HashingTF hashingTF = new HashingTF()
.setInputCol("filtered")
.setOutputCol("rawFeatures");
IDF idf = new IDF()
.setInputCol("rawFeatures")
.setOutputCol("features");
LDA lda = new LDA().setK(numTopics)
.setMaxIter(100)
.setDocConcentration(0.05);
ParamMap[] paramGrid = new ParamGridBuilder()
.addGrid(hashingTF.numFeatures(), new int[]{10, 100, 1000})
.addGrid(lda.topicConcentration(), betas)
.addGrid(lda.maxIter(),new int[]{20, 50, 100, 150})
.addGrid(lda.learningDecay(), new double[]{0.3, 0.5, 0.7, 0.9})
.build();
ClusteringEvaluator clusteringEvaluator = new ClusteringEvaluator();
clusteringEvaluator.setMetricName("logPerplexity");
Pipeline pipeline = new Pipeline().setStages(new PipelineStage[]{hashingTF, idf, lda});
CrossValidator crossValidator = new CrossValidator()
.setEstimator(pipeline)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(5)
.setEvaluator(???);
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
