'"libclntsh.so: cannot open shared object file in ubuntu to run python program in Spark Cluster

I have the Python program that works without any issue locally. But when I want to run it in Spark cluster I receive the error about libclntsh.so, the cluster has two nodes.

To explain more, to run the program in the cluster, first I set Master IP Address in spark-env.sh like this:

  export SPARK_MASTER_HOST=x.x.x.x

Then just write IP of slave nodes to $SPARK_HOME/conf/workers. After that, first I run Master with this line:

  /opt/spark/sbin/start-master.sh

Then run Slaves:

  /opt/spark/sbin/start-worker.sh spark://x.x.x.x:7077

Next I check that SPARK UI is up. So, I run the program in Master Node like this:

  /opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --files sparkConfig.json --py-files cst_utils.py,grouping.py,group_state.py,g_utils.py,csts.py,oracle_connection.py,config.py,brn_utils.py,emp_utils.py main.py  

When the above command is run, I receive this error:

   File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 604, in main
process()
   File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 594, in process
out_iter = func(split_index, iterator)
   File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
   File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
   File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 418, in func
   File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2144, in combineLocally
   File "/opt/spark/python/lib/pyspark.zip/pyspark/shuffle.py", line 240, in mergeValues
   for k, v in iterator:
   File "/opt/spark/python/lib/pyspark.zip/pyspark/util.py", line 73, in wrapper
return f(*args, **kwargs)
   File "/opt/spark/work/app-20220221165611-0005/0/customer_utils.py", line 340, in read_cst
    df_group = connection.read_sql(query_cnt)
   File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 109, in read_sql
   self.connect()
   File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 40, in connect
     self.conn = cx_Oracle.connect(db_url)
     cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: 
     "libclntsh.so: cannot open shared object file: No such file or directory". 

I set this Environment Variables in ~/.bashrc:

    export ORACLE_HOME=/usr/share/oracle/instantclient_19_8
    export LD_LIBRARY_PATH=$ORACLE_HOME:$LD_LIBRARY_PATH
    export PATH=$ORACLE_HOME:$PATH
    export JAVA_HOME=/usr/lib/jvm/java/jdk1.8.0_271
    export SPARK_HOME=/opt/spark
    export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
    export PATH=$PATH:$JAVA_HOME/bin
    export PYSPARK_PYTHON=/usr/bin/python3
    export PYSPARK_HOME=/usr/bin/python3.8
    export PYSPARK_DRIVER_PYTHON=python3.8

Would you please guide me what is wrong?

Any help would be appreciated.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source