'Spark 3.1.2 NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression$default$2()Lscala/Option;

I'm Building a jar and running it in EMR cluster.

I'm using spark-alchemy below version and calling a function named hll_init_agg inside .agg and getting above error.

CODE where it's called:

Dataset<Row> groupByDf = df
                        .groupBy(functions.col("A"), functions.col("DAY"), functions.col("C"), functions.col("D"))
                    .agg(com.swoop.alchemy.spark.expressions.hll.functions.hll_init_agg(functions.col("ID"), 0.005, "AGKN").alias("NEWID"));
<dependency>
               <groupId>com.swoop</groupId>
                <artifactId>spark-alchemy_2.12</artifactId>
                <version>1.1.0</version>
            </dependency>

Stack Trace :

22/03/23 07:06:22 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression$default$2()Lscala/Option; java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression$default$2()Lscala/Option; at com.swoop.alchemy.spark.expressions.WithHelper.withAggregateFunction(WithHelper.scala:13) at com.swoop.alchemy.spark.expressions.WithHelper.withAggregateFunction$(WithHelper.scala:10) at com.swoop.alchemy.spark.expressions.hll.functions$.withAggregateFunction(HLLFunctions.scala:653) at com.swoop.alchemy.spark.expressions.hll.HLLFunctions.hll_init_agg(HLLFunctions.scala:695) at com.swoop.alchemy.spark.expressions.hll.HLLFunctions.hll_init_agg$(HLLFunctions.scala:695) at com.swoop.alchemy.spark.expressions.hll.functions$.hll_init_agg(HLLFunctions.scala:653) at com.swoop.alchemy.spark.expressions.hll.functions.hll_init_agg(HLLFunctions.scala) at com.xxx.xxx.xxx.xxxx(xxxx.java:103) at com.xxx.xxx.xxxxx(MainClass.java:315) at com.xxxx.xxxx.main(MainClass.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:728) Exception in thread "main" org.apache.spark.SparkException: Application application_1647925696500_0096 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1196) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1587) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:936) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1015) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1024) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 22/03/23 10:06:14 INFO ShutdownHookManager: Shutdown hook called 22/03/23 10:06:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-a055b36c-0f9a-46f0-9575-893d300705f8 22/03/23 10:06:14 INFO ShutdownHookManager: Deleting directory /tmp/spark-9d482629-a936-4c16-8df4-ac426dcc12ff

Is there any heads up on this to resolve the issue. Any suggestion is much appreciated .



Solution 1:[1]

Resolved the issue. There was a jar version mismatch. EMR Cluster spark was having jar spark-catalyst_2.12-3.0.1-amzn-0.jar but we have added this jar in our maven dependency spark-catalyst_2.12-3.1.2-amzn-0.jar

Upon updating the version of jar to 3.1.2 , I was able to run the job.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Deepak