'Run Spark in IntelliJ Idea on Standalone Cluster with Master on same Windows Machine

I have been able to successfully run a Spark application in IntelliJ Idea when setting master as local[*]. However, when I set master to a separate instance of Spark an exception occurs.

The SparkPi App I am trying to execute is below.

import scala.math.random

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("spark tjvrlaptop:7077").setAppName("Spark Pi") //.set("spark.scheduler.mode", "FIFO").set("spark.cores.max", "8")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 20
    val n = math.max(100000000L * slices, Int.MaxValue).toInt // avoid overflow

    for(j <- 1 to 1000000) {
      val count = spark.parallelize(1 until n, slices).map { i =>
        val x = random * 2 - 1
        val y = random * 2 - 1
        if (x * x + y * y < 1) 1 else 0
      }.reduce(_ + _)
      println("Pi is roughly " + 4.0 * count / n)
    }
    spark.stop()
  }
}

Here is my build.sbt contents:

name := "SBTScalaSparkPi"

version := "1.0"

scalaVersion := "2.10.6"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"

Here is my plugins.sbt contents:

logLevel := Level.Warn

I executed the Spark Master and a worker by using the following commands in different command prompts on the same machine.

spark-1.6.1-bin-hadoop2.6\bin>spark-class org.apache.spark.deploy.master.Master --host tjvrlaptop

spark-1.6.1-bin-hadoop2.6\bin>spark-class org.apache.spark.deploy.worker.Worker spark tjvrlaptop:7077

[The Master and Worker seems to be up and running without any issues][1]

[1]: http i.stack.imgur.com/B3BDZ.png

Next I tried to run the program in IntelliJ. It fails after a while with the following errors:

**Command Promt where Master is running**

>     16/03/27 14:44:33 INFO Master: Registering app Spark Pi
>     16/03/27 14:44:33 INFO Master: Registered app Spark Pi with ID app-20160327144433-0000
>     16/03/27 14:44:33 INFO Master: Launching executor app-20160327144433-0000/0 on worker
> worker-20160327140440-192.168.56.1-52701
>     16/03/27 14:44:38 INFO Master: Received unregister request from application app-20160327144433-0000
>     16/03/27 14:44:38 INFO Master: Removing app app-20160327144433-0000
>     16/03/27 14:44:38 INFO Master: TJVRLAPTOP:55368 got disassociated, removing it.
>     16/03/27 14:44:38 INFO Master: 192.168.56.1:55350 got disassociated, removing it.
>     16/03/27 14:44:38 WARN Master: Got status update for unknown executor app-20160327144433-0000/0

**Command Prompt where the Worker is running**

> 16/03/27 14:44:34 INFO Worker: Asked to launch executor
> app-20160327144433-0000/0 for Spark Pi 16/03/27 14:44:34 INFO
> SecurityManager: Changing view acls to: tjoha 16/03/27 14:44:34 INFO
> SecurityManager: Changing modify acls to: tjoha 16/03/27 14:44:34 INFO
> SecurityManager: SecurityManager: authentication disabled; ui acls
> disabled; users with view permissions: Set(tjoha); users with modify
> permissions: Set(tjoha) 16/03/27 14:44:34 INFO ExecutorRunner: Launch
> command: "C Program Files\Java\jre1.8.0_77\bin\java" "-cp"
> "C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\conf\;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\lib\spark-assembly-1.6.1-hadoop2.6.0.jar;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\lib\datanucleus-api-jdo-3.2.6.jar;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\lib\datanucleus-core-3.2.10.jar;C Users\tjoha\Documents\spark-1.6.1-bin-hadoop2.6\bin\..\lib\datanucleus-rdbms-3.2.9.jar"
> "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=55350"
> "org.apache.spark.executor.CoarseGrainedExecutorBackend"
> "--driver-url" "spark [email protected]:55350"
> "--executor-id" "0" "--hostname" "192.168.56.1" "--cores" "8"
> "--app-id" "app-20160327144433-0000" "--worker-url"
> "spark [email protected]:52701" 16/03/27 14:44:38 INFO Worker:
> Asked to kill executor app-20160327144433-0000/0 16/03/27 14:44:38
> INFO ExecutorRunner: Runner thread for executor
> app-20160327144433-0000/0 interrupted 16/03/27 14:44:38 INFO
> ExecutorRunner: Killing process! 16/03/27 14:44:38 INFO Worker:
> Executor app-20160327144433-0000/0 finished with state KILLED
> exitStatus 1 16/03/27 14:44:38 INFO Worker: Cleaning up local
> directories for application app-20160327144433-0000 16/03/27 14:44:38
> INFO ExternalShuffleBlockResolver: Application app-20160327144433-0000
> removed, cleanupLocalDirs = true

**IntelliJ Idea Output**

> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties 16/03/27 15:06:04 INFO
> SparkContext: Running Spark version 1.6.1 16/03/27 15:06:05 WARN
> NativeCodeLoader: Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable 16/03/27
> 15:06:05 INFO SecurityManager: Changing view acls to: tjoha 16/03/27
> 15:06:05 INFO SecurityManager: Changing modify acls to: tjoha 16/03/27
> 15:06:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(tjoha);
> users with modify permissions: Set(tjoha) 16/03/27 15:06:06 INFO
> Utils: Successfully started service 'sparkDriver' on port 56183.
> 16/03/27 15:06:07 INFO Slf4jLogger: Slf4jLogger started 16/03/27
> 15:06:07 INFO Remoting: Starting remoting 16/03/27 15:06:07 INFO
> Remoting: Remoting started; listening on addresses
> :[akka tcp [email protected]:56196] 16/03/27
> 15:06:07 INFO Utils: Successfully started service
> 'sparkDriverActorSystem' on port 56196. 16/03/27 15:06:07 INFO
> SparkEnv: Registering MapOutputTracker 16/03/27 15:06:07 INFO
> SparkEnv: Registering BlockManagerMaster 16/03/27 15:06:07 INFO
> DiskBlockManager: Created local directory at
> C Users\tjoha\AppData\Local\Temp\blockmgr-9623b0f9-81f5-4a10-bbc7-ba077d53a2e5
> 16/03/27 15:06:07 INFO MemoryStore: MemoryStore started with capacity
> 2.4 GB 16/03/27 15:06:07 INFO SparkEnv: Registering OutputCommitCoordinator 16/03/27 15:06:07 WARN Utils: Service
> 'SparkUI' could not bind on port 4040. Attempting port 4041. 16/03/27
> 15:06:07 INFO Utils: Successfully started service 'SparkUI' on port
> 4041. 16/03/27 15:06:07 INFO SparkUI: Started SparkUI at http 192.168.56.1:4041 16/03/27 15:06:08 INFO
> AppClient$ClientEndpoint: Connecting to master
> spark tjvrlaptop:7077... 16/03/27 15:06:09 INFO
> SparkDeploySchedulerBackend: Connected to Spark cluster with app ID
> app-20160327150608-0002 16/03/27 15:06:09 INFO
> AppClient$ClientEndpoint: Executor added: app-20160327150608-0002/0 on
> worker-20160327150550-192.168.56.1-56057 (192.168.56.1:56057) with 8
> cores 16/03/27 15:06:09 INFO SparkDeploySchedulerBackend: Granted
> executor ID app-20160327150608-0002/0 on hostPort 192.168.56.1:56057
> with 8 cores, 1024.0 MB RAM 16/03/27 15:06:09 INFO
> AppClient$ClientEndpoint: Executor updated: app-20160327150608-0002/0
> is now RUNNING 16/03/27 15:06:09 INFO Utils: Successfully started
> service 'org.apache.spark.network.netty.NettyBlockTransferService' on
> port 56234. 16/03/27 15:06:09 INFO NettyBlockTransferService: Server
> created on 56234 16/03/27 15:06:09 INFO BlockManagerMaster: Trying to
> register BlockManager 16/03/27 15:06:09 INFO
> BlockManagerMasterEndpoint: Registering block manager
> 192.168.56.1:56234 with 2.4 GB RAM, BlockManagerId(driver, 192.168.56.1, 56234) 16/03/27 15:06:09 INFO BlockManagerMaster: Registered BlockManager 16/03/27 15:06:09 INFO
> SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling
> beginning after reached minRegisteredResourcesRatio: 0.0 16/03/27
> 15:06:10 INFO SparkContext: Starting job: reduce at SparkPi.scala:37
> 16/03/27 15:06:10 INFO DAGScheduler: Got job 0 (reduce at
> SparkPi.scala:37) with 20 output partitions 16/03/27 15:06:10 INFO
> DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:37)
> 16/03/27 15:06:10 INFO DAGScheduler: Parents of final stage: List()
> 16/03/27 15:06:10 INFO DAGScheduler: Missing parents: List() 16/03/27
> 15:06:10 INFO DAGScheduler: Submitting ResultStage 0
> (MapPartitionsRDD[1] at map at SparkPi.scala:33), which has no missing
> parents 16/03/27 15:06:10 INFO MemoryStore: Block broadcast_0 stored
> as values in memory (estimated size 1880.0 B, free 1880.0 B) 16/03/27
> 15:06:10 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in
> memory (estimated size 1212.0 B, free 3.0 KB) 16/03/27 15:06:10 INFO
> BlockManagerInfo: Added broadcast_0_piece0 in memory on
> 192.168.56.1:56234 (size: 1212.0 B, free: 2.4 GB) 16/03/27 15:06:10 INFO SparkContext: Created broadcast 0 from broadcast at
> DAGScheduler.scala:1006 16/03/27 15:06:10 INFO DAGScheduler:
> Submitting 20 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at
> map at SparkPi.scala:33) 16/03/27 15:06:10 INFO TaskSchedulerImpl:
> Adding task set 0.0 with 20 tasks 16/03/27 15:06:14 INFO
> SparkDeploySchedulerBackend: Registered executor
> NettyRpcEndpointRef(null) (TJVRLAPTOP:56281) with ID 0 16/03/27
> 15:06:14 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0,
> TJVRLAPTOP, partition 0,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:14

> ...

> TJVRLAPTOP, partition 6,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:14
> INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7,
> TJVRLAPTOP, partition 7,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:14
> INFO BlockManagerMasterEndpoint: Registering block manager
> TJVRLAPTOP:56319 with 511.1 MB RAM, BlockManagerId(0, TJVRLAPTOP,
> 56319) 16/03/27 15:06:15 INFO BlockManagerInfo: Added
> broadcast_0_piece0 in memory on TJVRLAPTOP:56319 (size: 1212.0 B,
> free: 511.1 MB) 16/03/27 15:06:15 INFO TaskSetManager: Starting task
> 8.0 in stage 0.0 (TID 8, TJVRLAPTOP, partition 8,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15 INFO TaskSetManager: Starting task 9.0 in
> stage 0.0 (TID 9, TJVRLAPTOP, partition 9,PROCESS_LOCAL, 2078 bytes)
> 16/03/27 15:06:15 INFO TaskSetManager: Starting task 10.0 in stage 0.0
> (TID 10, TJVRLAPTOP, partition 10,PROCESS_LOCAL, 2078 bytes) 16/03/27
> 15:06:15 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11,
> TJVRLAPTOP, partition 11,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15

> ...

> java.lang.ClassNotFoundException: SparkPi$$anonfun$main$1$$anonfun$1
>   at java.net.URLClassLoader.findClass(Unknown Source)    at
> java.lang.ClassLoader.loadClass(Unknown Source)   at
> java.lang.ClassLoader.loadClass(Unknown Source)   at
> java.lang.Class.forName0(Native Method)   at
> java.lang.Class.forName(Unknown Source)   at
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
>   at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)   at
> java.io.ObjectInputStream.readClassDesc(Unknown Source)   at
> java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)  at
> java.io.ObjectInputStream.readObject0(Unknown Source)     at
> java.io.ObjectInputStream.defaultReadFields(Unknown Source)   at
> java.io.ObjectInputStream.readSerialData(Unknown Source)  at
> java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)  at
> java.io.ObjectInputStream.readObject0(Unknown Source)     at
> java.io.ObjectInputStream.defaultReadFields(Unknown Source)   at
> java.io.ObjectInputStream.readSerialData(Unknown Source)  at
> java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)  at
> java.io.ObjectInputStream.readObject0(Unknown Source)     at
> java.io.ObjectInputStream.defaultReadFields(Unknown Source)   at
> java.io.ObjectInputStream.readSerialData(Unknown Source)  at
> java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)  at
> java.io.ObjectInputStream.readObject0(Unknown Source)     at
> java.io.ObjectInputStream.readObject(Unknown Source)  at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
>   at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)   at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source) 16/03/27 15:06:15 INFO
> TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5) on executor
> TJVRLAPTOP: java.lang.ClassNotFoundException
> (SparkPi$$anonfun$main$1$$anonfun$1) [duplicate 1] 16/03/27 15:06:15
> INFO TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3) on executor
> TJVRLAPTOP: java.lang.ClassNotFoundException

> ...
 
> INFO TaskSetManager: Starting task 10.1 in stage 0.0 (TID 20,
> TJVRLAPTOP, partition 10,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15

> ...

> TJVRLAPTOP, partition 3,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:15
> INFO TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4) on executor
> TJVRLAPTOP: java.lang.ClassNotFoundException
> (SparkPi$$anonfun$main$1$$anonfun$1) [duplicate 8] 16/03/27 15:06:15
> INFO TaskSetManager: Lost task 12.0 in stage 0.0 (TID 12) on executor
> TJVRLAPTOP: java.lang.ClassNotFoundException

> ...

> INFO TaskSetManager: Starting task 2.3 in stage 0.0 (TID 39,
> TJVRLAPTOP, partition 2,PROCESS_LOCAL, 2078 bytes) 16/03/27 15:06:16
> MapOutputTrackerMasterEndpoint stopped! 16/03/27 15:06:16 WARN
> TransportChannelHandler: Exception in connection from
> TJVRLAPTOP/192.168.56.1:56281 java.io.IOException: An existing
> connection was forcibly closed by the remote host 16/03/27 15:06:17
> INFO MemoryStore: MemoryStore cleared 16/03/27 15:06:17 INFO
> BlockManager: BlockManager stopped 16/03/27 15:06:17 INFO
> BlockManagerMaster: BlockManagerMaster stopped 16/03/27 15:06:17 INFO
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
> OutputCommitCoordinator stopped! 16/03/27 15:06:17 INFO SparkContext:
> Successfully stopped SparkContext 16/03/27 15:06:17 INFO
> ShutdownHookManager: Shutdown hook called 16/03/27 15:06:17 INFO
> RemoteActorRefProvider$RemotingTerminator: Shutting down remote
> daemon. 16/03/27 15:06:17 INFO ShutdownHookManager: Deleting directory
> C Users\tjoha\AppData\Local\Temp\spark-11f8184f-23fb-43be-91bb-113fb74aa8b9


Solution 1:[1]

When you are running in embedded mode(local[*]), Spark has all required code on classpath.

When you are running in standalone mode you will have to make it available explicitly to Spark by copying the jar to lib folder.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Reddy