'java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException while running DAG in Airflow
I am running a python project through DAG in airflow, and I encounter the following exception when the dag runs this line from the project -
df = spark.sql(query)
Exception -
py4j.protocol.Py4JJavaError: An error occurred while calling o147.sql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException
I have hive metastore version set as 3.1.2 and all the jar files are in /opt/hive/lib and config files like (hive-site.xml etc.) are in /project_name/conf. Also I unzipped hive-exec.jar and found that it did contain HiveException.class, so I am not sure why this error is thrown.
Furthermore, I have made changes in the DAG in the _confbase I have added these -
"spark.driver.extraLibraryPath": f'/opt/hive/lib/*'
"spark.executor.extraLibraryPath": f'/opt/hive/lib/*'
also I have specifically passed the same jar as a parameter under Spark_submit -
spark_submit_task = SparkSubmitOperator(
task_id='spark_submit_task',
conn_id='spark_conn',
name=f'{jobname}_extract',
jars=f'/opt/hive/lib/*'
application="local:///*.py",
application_args=[*],
conf={
'spark.hadoop.fs.defaultFS': f'{s3_bucket_uri}/',
'spark.kubernetes.container.image': test_image,
"spark.hadoop.hive.metastore.uris": metastore_uri,
"spark.kubernetes.driverEnv.jobname": f'{jobname}_load',
"spark.executorEnv.jobname": f'{jobname}_load',
**__confbase,
**__sparkexecutorconfig_1,
**__hive_metastore_config_312,
**__aqe,
**__dynamicPartitionConfig,
},
dag=dag,
env_vars=env_vars,
executor_config=executor_config,
)
Any suggestions would be helpful.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
