'IllegalArgumentException: requirement failed: rawPredictionCol vectors must have length=2, but got 3 while testing model in Apache Spark
I'm trying to create model and evaluate in Apache Spark 3.1.1 with OneR algorithm. I have .csv file with normalized data (all values are double, but some values are very close to 0).
I was reading MLlib main guide OnevsRest and code is very similar to this:
SparkSession session = SparkSession
.builder()
.appName("Spark test")
.master("local")
.getOrCreate();
JavaRDD<LabeledPoint> data = loadData(session, "path.csv");
LogisticRegression logisticRegression = new LogisticRegression().setMaxIter(20);
OneVsRest oneR = new OneVsRest().setClassifier(logisticRegression);
BinaryClassificationEvaluator evaluator = new BinaryClassificationEvaluator();
MulticlassMetrics metrics = new MulticlassMetrics(data.rdd());
MulticlassClassificationEvaluator multiEvaluator = new MulticlassClassificationEvaluator()
.setMetricName("accuracy");
JavaRDD<LabeledPoint>[] javaRDDS = data.randomSplit(new double[]{0.7, 0.3});
JavaRDD<LabeledPoint> trainingRDD = javaRDDS[0], testRDD = javaRDDS[1];
Dataset<Row> trainingDataset = session.createDataFrame(trainingRDD, LabeledPoint.class);
Dataset<Row> testDataset = session.createDataFrame(testRDD, LabeledPoint.class);
OneVsRestModel oneRModel = oneR.fit(trainingDataset);
Dataset<Row> oneRPredictions = oneRModel.transform(testDataset).select("prediction", "label");
double oneRAcc = evaluator.evaluate(oneRPredictions);
System.out.println("OneR: \r\n");
System.out.println("Accuracy: " + oneRAcc);
System.out.println("--------------------------------");
session.close();
This code throws exception:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: rawPredictionCol vectors must have length=2, but got 3
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.ml.evaluation.BinaryClassificationEvaluator.$anonfun$getMetrics$1(BinaryClassificationEvaluator.scala:126)
at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.ml.evaluation.BinaryClassificationEvaluator.getMetrics(BinaryClassificationEvaluator.scala:126)
at org.apache.spark.ml.evaluation.BinaryClassificationEvaluator.evaluate(BinaryClassificationEvaluator.scala:100)
at Classification.main(Classification.java:64)
Why this code doesn't work? I thought that problem is in .select("prediction", "label") because I don't know what I have in Dataset<Row> after transform, but vectors must have length of 2 not 3 is strange. I'm trying multiclass classification with 3 classes.
Edit
I was using BinaryClassificationEvaluator evaluator instead of MulticlassClassificationEvaluator multiEvaluator by mistake. Now error message makes sense.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
