'FileNotFoundException when submitting to Spark Cluster
I've created a small application using Apache Spark, when I run the application locally everything runs fine. But when I submit it to a 6-node cluster I get a FileNotFoundException, because he can't find the input file.
This is my tiny application.
def main (args: Array[String]) {
val sparkContext = new SparkContext(new SparkConf())
val tweets = sparkContext.textFile(args(0))
tweets.map { line => (line, LanguageDetector.create().detect(line)) }
.saveAsTextFile("/data/detected")
}
I submit the application with the following command:
/opt/spark-1.0.2-bin-hadoop2/bin/spark-submit --class YarnTest --master spark://luthor-v1:7077 lang_detect.jar twitter_data
After the submit I get the following exception:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 4 times, most recent failure: Exception failure in TID 6 on host luthor-v5: java.io.FileNotFoundException: File file:/opt/bb/twitter_data does not exist
The file is there for sure, the jar and the file are in the same directory and it can resolve the full path.
Thanks in advance
Solution 1:[1]
spark-submit assumes that the jar is residing in the current working directory and the file mentioned is in the hdfs. Copy your file twitter_data from the local file system to hdfs like this:
hadoop fs -copyFromLocal twitter_data /twitter_data
It will copy the file into / directory of hdfs. Now run the command:
spark-submit --class YarnTest --master spark://luthor-v1:7077 lang_detect.jar /twitter_data
Solution 2:[2]
The Hadoop configuration directory "spark-env.sh" is probably not ok. Please check it. It should be: "your_hadoop_dir /etc/hadoop/ "
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Anas |
| Solution 2 | Carlos AG |
