'Can I send a whole folder with spark-submit?
I know that one can send files through spark-submit with the --files option, but is it also possible to send a whole folder?
Actually I want to send the lib folder, containing jar files of external libraries. Or does the --jars option already make a lib folder on the executor directory? In my case it is necessary, that there is a lib folder, otherwise it would give an error.
Solution 1:[1]
No, spark-submit --files option doesn't support sending folder, but you can put all your files in a zip, use that file in --files list. You can use SparkFiles.get(filename) in your spark job to load the file, explode it and use exploded files. 'filename' doesn't need to be absolute path, just filename does it.
PS: It works only after SparkContext has been initialized.
Solution 2:[2]
You can do this:
- Archive the folder --> myfiles.zip
- Ship the archive using the "spark.yarn.dist.archives" conf:
Example:
spark-submit \
...
--conf spark.yarn.dist.archives=myfiles.zip
...
Solution 3:[3]
I think you have multiple solutions to do this.
First I can understand that you want to automatize this, but if you don't have much jars you can just pass them one by one as arguments to the --jars option.
Otherwise you can just sudo mv all your jars in the spark/jars directory of your Spark installation, but it's annoying in the case of cluster.
So finally, you can do this
This doesn't resolve the problem if you need the cluster mode. For a cluster mode, I would just modify the bash code for querying an HDFS directory of your jars. And put all your jars in the HDFS directory.
Maybe there are other solutions, but that was my thoughts,
Have a good week end !
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | tricky |
