'java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException while running DAG in Airflow

I am running a python project through DAG in airflow, and I encounter the following exception when the dag runs this line from the project -

df = spark.sql(query)

Exception -

py4j.protocol.Py4JJavaError: An error occurred while calling o147.sql. 
: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException 

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException 

I have hive metastore version set as 3.1.2 and all the jar files are in /opt/hive/lib and config files like (hive-site.xml etc.) are in /project_name/conf. Also I unzipped hive-exec.jar and found that it did contain HiveException.class, so I am not sure why this error is thrown.

Furthermore, I have made changes in the DAG in the _confbase I have added these -

"spark.driver.extraLibraryPath": f'/opt/hive/lib/*'
"spark.executor.extraLibraryPath": f'/opt/hive/lib/*'

also I have specifically passed the same jar as a parameter under Spark_submit -

spark_submit_task = SparkSubmitOperator(
    task_id='spark_submit_task',
    conn_id='spark_conn',
    name=f'{jobname}_extract',
    jars=f'/opt/hive/lib/*'
    application="local:///*.py",
    application_args=[*],
    conf={
        'spark.hadoop.fs.defaultFS': f'{s3_bucket_uri}/',
        'spark.kubernetes.container.image': test_image,
        "spark.hadoop.hive.metastore.uris": metastore_uri,
        "spark.kubernetes.driverEnv.jobname": f'{jobname}_load',
        "spark.executorEnv.jobname": f'{jobname}_load',
        **__confbase,
        **__sparkexecutorconfig_1,
        **__hive_metastore_config_312,
        **__aqe,
        **__dynamicPartitionConfig,
    },
    dag=dag,
    env_vars=env_vars,
    executor_config=executor_config,
)

Any suggestions would be helpful.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source