'Exception after Setting property 'spark.sql.hive.metastore.jars' in 'spark-defaults.conf'
Given below is the version of Spark & Hive I have installed in my system
Spark : spark-1.4.0-bin-hadoop2.6
Hive : apache-hive-1.0.0-bin
I have configured the Hive installation to use MySQL as Metastore. The goal is to access the MySQL Metastore & execute HiveQL queries inside spark-shell(using HiveContext)
So far I am able to execute the HiveQL queries by accessing the Derby Metastore(As described here, believe Spark-1.4 comes bundled with Hive 0.13.1 which in turn uses the internal Derby database as Metastore)
Then I tried to point spark-shell to my external Metastore(MySQL in this case) by setting the property(as suggested here) given below in $SPARK_HOME/conf/spark-defaults.conf,
spark.sql.hive.metastore.jars /home/mountain/hv/lib:/home/mountain/hp/lib
I have also copied $HIVE_HOME/conf/hive-site.xml into $SPARK_HOME/conf. But I am getting the following exception when I start the spark-shell
mountain@mountain:~/del$ spark-shell
Spark context available as sc.
java.lang.ClassNotFoundException: java.lang.NoClassDefFoundError:
org/apache/hadoop/hive/ql/session/SessionState when creating Hive client
using classpath: file:/home/mountain/hv/lib/, file:/home/mountain/hp/lib/
Please make sure that jars for your version of hive and hadoop are
included in the paths passed to spark.sql.hive.metastore.jars.
Am I missing something (or) not setting the property spark.sql.hive.metastore.jars correctly?
Solution 1:[1]
Note: In Linux Mint verified.
If you are setting properties in spark-defaults.conf, spark will take those settings only when you submit your job using spark-submit.
file: spark-defaults.conf
spark.driver.extraJavaOptions -Dlog4j.configuration=file:log4j.properties -Dspark.yarn.app.container.log.dir=app-logs -Dlogfile.name=hello-spark
spark.jars.packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1,org.apache.spark:spark-avro_2.12:3.0.1
In the terminal run your job say wordcount.py
spark-submit /path-to-file/wordcount.py
If you want to run your job in development mode from an IDE then you should use config() method. Here we will set Kafka jar packages
spark = SparkSession.builder \
.appName('Hello Spark') \
.master('local[3]') \
.config("spark.streaming.stopGracefullyOnShutdown", "true") \
.config("spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1") \
.getOrCreate()
Solution 2:[2]
Corrupted version of hive-site.xml will cause this... please copy the correct hive-site.xml
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pramod Kumar Sharma |
| Solution 2 | sajid |
