'running a spark submit job as cluster deploy mode fails but passes with client

EDITI: by removing the conf setting in the app for 'setMaster' I'm able to run yarn-cluster successfully - if anyone coudl help with spark master as cluster deploy - that'd be fantastic

I'm trying to set up spark on a local testmachine so that I can read from an s3 bucket and then write back to it.

Running the jar/application using client works fine, well, fine in that it goes off to the bucket and creates a file and comes back again.

However I need this to work in cluster mode so that it more closely resembles our prod environment yet it's constantly failing - no real sensible messages in the logs that I can see and little feedback to go on.

Any help is greatly appreciated - I'm very new to spark/hadoop so may have overlooked something obvious.

I also tried running with yarn-cluster as the master but that failed for a different reason (saying it couldn't find the s3Native classes - which I pass in as jars)

This is one a windows ev

The command I'm running:

c:\>spark-submit --jars="C:\Spark\hadoop\share\hadoop\common\lib\hadoop-aws-2.7.1.jar,C:\Spark\hadoop\share\hadoop\common\lib\aws-java-sdk-1.7.4.jar" --verbose --deploy-mode cluster --master spark://127.0.0.1:7077 --class FileInputRename c:\sparkSubmit\sparkSubmit_NoJarSetInConf.jar "s3://bucket/jar/fileInputRename.txt"

The output from this on the console is:

    Using properties file: C:\Spark\bin\..\conf\spark-defaults.conf
Parsed arguments:
  master                  spark://127.0.0.1:7077
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          C:\Spark\bin\..\conf\spark-defaults.conf
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               FileInputRename
  primaryResource         file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
  name                    FileInputRename
  childArgs               [s3://SessionCam-Steve/jar/fileInputRename.txt]
  jars                    file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file C:\Spark\bin\..\conf\spark-defaults.conf:



Running Spark using the REST application submission protocol.
Main class:
org.apache.spark.deploy.rest.RestSubmissionClient
Arguments:
file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
FileInputRename
s3://SessionCam-Steve/jar/fileInputRename.txt
System properties:
SPARK_SUBMIT -> true
spark.driver.supervise -> false
spark.app.name -> FileInputRename
spark.jars -> file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar,file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
spark.submit.deployMode -> cluster
spark.master -> spark://127.0.0.1:7077
Classpath elements:



16/03/24 12:01:56 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://127.0.0.1:7077.

After a few more seconds it shows the c prompt and nothing else. The logs on 8080 :

Application ID  Name    Cores   Memory per Node Submitted Time  User    State   Duration
app-20160324120221-0016 FileInputRename 1   1024.0 MB   2016/03/24 12:02:21 Administrator   FINISHED    3 s

where the error message only shows:

    16/03/24 12:02:24 INFO spark.SecurityManager: Changing view acls to: Administrator
16/03/24 12:02:24 INFO spark.SecurityManager: Changing modify acls to: Administrator
16/03/24 12:02:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)

If I run the yarn-cluster as main so that this is my command:

c:>spark-submit --jars="C:\Spark\hadoop\share\hadoop\common\lib\hadoop-aws-2.7.1.jar,C:\Spark\hadoop\share\hadoop\common\lib\aws-java-sdk-1.7.4.jar" --verbose --master yarn-cluster --class FileInputRename c:\sparkSubmit\sparkSubmit_NoJarSetInConf.jar "s3://SessionCam-Steve/jar/fileInputRename.txt"

The output and exception:

    Using properties file: C:\Spark\bin\..\conf\spark-defaults.conf
Parsed arguments:
  master                  yarn-cluster
  deployMode              null
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          C:\Spark\bin\..\conf\spark-defaults.conf
  driverMemory            null
  driverCores             null
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               FileInputRename
  primaryResource         file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
  name                    FileInputRename
  childArgs               [s3://SessionCam-Steve/jar/fileInputRename.txt]
  jars                    file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file C:\Spark\bin\..\conf\spark-defaults.conf:



Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
FileInputRename
--addJars
file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar,file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar
--jar
file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar
--class
FileInputRename
--arg
s3://SessionCam-Steve/jar/fileInputRename.txt
System properties:
SPARK_SUBMIT -> true
spark.app.name -> FileInputRename
spark.submit.deployMode -> cluster
spark.master -> yarn-cluster
Classpath elements:



16/03/24 12:05:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/03/24 12:05:23 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/03/24 12:05:23 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/03/24 12:05:23 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/03/24 12:05:23 INFO yarn.Client: Setting up container launch context for our AM
16/03/24 12:05:23 INFO yarn.Client: Setting up the launch environment for our AM container
16/03/24 12:05:23 INFO yarn.Client: Preparing resources for our AM container
16/03/24 12:05:24 WARN : Your hostname, WIN-EU4MXZ2GSIW resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:a94:1d11%14, but we couldn't find any external IP address!
16/03/24 12:05:25 INFO yarn.Client: Uploading resource file:/C:/Spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/spark-assembly-1.6.1-had
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/c:/sparkSubmit/sparkSubmit_NoJarSetInConf.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/sparkSubmit_NoJarSetInConf.j
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/Spark/hadoop/share/hadoop/common/lib/hadoop-aws-2.7.1.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/hadoop-aws-2.
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/Spark/hadoop/share/hadoop/common/lib/aws-java-sdk-1.7.4.jar -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_1458817514983_0004/aws-java-sd
16/03/24 12:05:27 INFO yarn.Client: Uploading resource file:/C:/temp/2/spark-12375b13-dac4-42b8-9ff6-19b0f895c5d1/__spark_conf__7363738392648975127.zip -> hdfs://0.0.0.0:19000/user/Administrator/.sparkStaging/application_14
16/03/24 12:05:28 INFO spark.SecurityManager: Changing view acls to: Administrator
16/03/24 12:05:28 INFO spark.SecurityManager: Changing modify acls to: Administrator
16/03/24 12:05:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
16/03/24 12:05:28 INFO yarn.Client: Submitting application 4 to ResourceManager
16/03/24 12:05:29 INFO impl.YarnClientImpl: Submitted application application_1458817514983_0004
16/03/24 12:05:30 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:30 INFO yarn.Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1458821128787
         final status: UNDEFINED
         tracking URL: http://WIN-EU4MXZ2GSIW:8088/proxy/application_1458817514983_0004/
         user: Administrator
16/03/24 12:05:31 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:32 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:33 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:34 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:35 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:36 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:37 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:38 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:39 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:40 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:41 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:42 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:43 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:44 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:45 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:46 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:47 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:48 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:49 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:50 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:51 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:52 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:53 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:54 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:55 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:57 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:58 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:05:59 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:00 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:01 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:02 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:03 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:04 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:05 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:06 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:07 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:08 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:09 INFO yarn.Client: Application report for application_1458817514983_0004 (state: ACCEPTED)
16/03/24 12:06:10 INFO yarn.Client: Application report for application_1458817514983_0004 (state: FAILED)
16/03/24 12:06:10 INFO yarn.Client:
         client token: N/A
         diagnostics: Application application_1458817514983_0004 failed 2 times due to AM Container for appattempt_1458817514983_0004_000002 exited with  exitCode: 15
For more detailed output, check application tracking page:http://WIN-EU4MXZ2GSIW:8088/cluster/app/application_1458817514983_0004Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1458817514983_0004_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
        at org.apache.hadoop.util.Shell.run(Shell.java:456)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Shell output:         1 file(s) moved.


Container exited with a non-zero exit code 15
Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1458821128787
         final status: FAILED
         tracking URL: http://WIN-EU4MXZ2GSIW:8088/cluster/app/application_1458817514983_0004
         user: Administrator
Exception in thread "main" org.apache.spark.SparkException: Application application_1458817514983_0004 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/03/24 12:06:10 INFO util.ShutdownHookManager: Shutdown hook called
16/03/24 12:06:10 INFO util.ShutdownHookManager: Deleting directory C:\temp\2\spark-12375b13-dac4-42b8-9ff6-19b0f895c5d1

this creates two application ids in the gui:

Application ID  Name    Cores   Memory per Node Submitted Time  User    State   Duration
app-20160324120600-0018 FileInputRename 2   1024.0 MB   2016/03/24 12:06:00 Administrator   FINISHED    9 s
app-20160324120543-0017 FileInputRename 2   1024.0 MB   2016/03/24 12:05:43 Administrator   FINISHED    8 s

both of which have this as the exception:

 16/03/24 12:05:49 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:84)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
    ... 16 more

it'd be fantastic and a huge relief if I could get one of these working - thank you in advance for any help.

hadoop apache-spark

Solution 1:^[1]

Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found

you have problem with classpath. in cluster mode the "application" is executed on one of nodes which probably has their own classpath, so

Either hadoop-aws-2.7.1.jar for some reason is not present at all(despite the fact that you provide it with --jars - check if it present on all workers at provided path)

or there is another hadoop-aws jar in classpath with another version(personally I think it would be 2nd variant). Try remove --jars=... argument and see if it helps.

Solution 2:^[2]

If you have already your yarn cluster installed and configured you just need to set HADOOP_CONF_DIR environment variable to point to your client hadoop configuration. You don't need specify those aws jars since they are already provided by your hadoop cluster. So if you provide them again they could conflict with the ones already there. Reference the documentation here for the spark submit. Also I would suggest to use s3n as protocol to read and write from S3.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Igor Berman
Solution 2	PinoSan

'running a spark submit job as cluster deploy mode fails but passes with client

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]