'Spark-submit not working when application jar is in hdfs

I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local filesystem, it works. However, when I copied my application jar to a directory in hdfs, i get the following exception:

Warning: Skip remote jar hdfs://localhost:9000/user/hdfs/jars/simple-project-1.0-SNAPSHOT.jar. java.lang.ClassNotFoundException: com.example.SimpleApp

Here's the command:

$ ./bin/spark-submit --class com.example.SimpleApp --master local hdfs://localhost:9000/user/hdfs/jars/simple-project-1.0-SNAPSHOT.jar

I'm using hadoop version 2.6.0, spark version 1.2.1



Solution 1:[1]

The only way it worked for me, when I was using

--master yarn-cluster

Solution 2:[2]

To make HDFS library accessible to spark-job , you have to run job in cluster mode.

$SPARK_HOME/bin/spark-submit \
--deploy-mode cluster \
--class <main_class> \
--master yarn-cluster \
hdfs://myhost:8020/user/root/myjar.jar

Also, There is Spark JIRA raised for client mode which is not supported yet.

SPARK-10643 :Support HDFS application download in client mode spark submit

Solution 3:[3]

There is a workaround. You could mount the directory in HDFS (which contains your application jar) as local directory.

I did the same (with azure blob storage, but it should be similar for HDFS)

example command for azure wasb

sudo mount -t cifs //{storageAccountName}.file.core.windows.net/{directoryName} {local directory path} -o vers=3.0,username={storageAccountName},password={storageAccountKey},dir_mode=0777,file_mode=0777

Now, in your spark submit command, you provide the path from the command above

$ ./bin/spark-submit --class com.example.SimpleApp --master local {local directory path}/simple-project-1.0-SNAPSHOT.jar

Solution 4:[4]

spark-submit --master spark://kssr-virtual-machine:7077 --deploy-mode client --executor-memory 1g hdfs://localhost:9000/user/wordcount.py

For me its working I am using Hadoop 3.3.1 & Spark 3.2.1. I am able to read the file from HDFS.

Solution 5:[5]

Yes, it has to be a local file. I think that's simply the answer.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Romain
Solution 2 enrique-carbonell
Solution 3 OneCricketeer
Solution 4 Kumar Sanu
Solution 5 Sean Owen