'Invalid Spark URL in local spark session

since updating to Spark 2.3.0, tests which are run in my CI (Semaphore) fail due to a allegedly invalid spark url when creating the (local) spark context:

18/03/07 03:07:11 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610
    at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
    at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
    at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:32)
    at org.apache.spark.executor.Executor.<init>(Executor.scala:155)
    at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:59)
    at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)

The spark session is created as following:

val sparkSession: SparkSession = SparkSession
.builder
.appName(s"LocalTestSparkSession")
.config("spark.broadcast.compress", "false")
.config("spark.shuffle.compress", "false")
.config("spark.shuffle.spill.compress", "false")
.master("local[3]")
.getOrCreate

Before updating to Spark 2.3.0, no problems were encountered in version 2.2.1 and 2.1.0. Also, running the tests locally works fine.



Solution 1:[1]

Change the SPARK_LOCAL_HOSTNAME to localhost and try.

export SPARK_LOCAL_HOSTNAME=localhost

Solution 2:[2]

This has been resolved by setting sparkSession config "spark.driver.host" to the IP address.

It seems that this change is required from 2.3 onwards.

Solution 3:[3]

Change your hostname to have NO underscore.

spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610 to spark://HeartbeatReceiver@LXCtrusty1802d57a40eb:44610

Ubuntu AS root

#hostnamectl status
#hostnamectl --static set-hostname LXCtrusty1802d57a40eb

#nano /etc/hosts
    127.0.0.1   LXCtrusty1802d57a40eb
#reboot 

Solution 4:[4]

Try to run Spark locally, with as many worker threads as logical cores on your machine :

.master("local[*]")

Solution 5:[5]

I would like to complement @Prakash Annadurai answer by saying:

If you want to make the variable settlement last after exiting the terminal, add it to your shell profile (e.g. ~/.bash_profile) with the same command:

export SPARK_LOCAL_HOSTNAME=localhost

Solution 6:[6]

For anyone working in Jupyter Notebook. Adding %env SPARK_LOCAL_HOSTNAME=localhost to the very beginning of the cell solved it for me. Like so:

%env SPARK_LOCAL_HOSTNAME=localhost

import findspark
findspark.init()

from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Test")
sc = SparkContext(conf = conf)

Solution 7:[7]

As mentioned in above answers, You need to change SPARK_LOCAL_HOSTNAME to localhost. In windows, you have to use SET command, SET SPARK_LOCAL_HOSTNAME=localhost

but this SET command is temporary. you may have to run it again and again in every new terminal. but instead, you can use SETX command, which is permanent.

SETX SPARK_LOCAL_HOSTNAME localhost

You can type above command in any place. just open a command prompt and run above command. Notice that unlike SET command, SETX command do not allow equation mark. you need to separate environment variable and the value by a Space.

if Success, you will see a message like "SUCCESS: Specified value was saved"

you can also verify that your variable is successfully added by just typing SET in a different command prompt. (or type SET s , which gives variables, starting with the letter 'S'). you can see that SPARK_LOCAL_HOSTNAME=localhost in results, which will not happen if you use SET command instead of SETX

Solution 8:[8]

If you don't want to change the environment variable, you can change the code to add the config in the SparkSession builder (like Hanisha said above).

In PySpark:

spark = SparkSession.builder.config("spark.driver.host", "localhost").getOrCreate()

Solution 9:[9]

Setting .config("spark.driver.host", "localhost") fixed the issue for me.

        SparkSession spark = SparkSession
            .builder()
            .config("spark.master", "local")
            .config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
            .config("spark.hadoop.fs.s3a.buffer.dir", "/tmp")
            .config("spark.driver.memory", "2048m")
            .config("spark.executor.memory", "2048m")
            .config("spark.driver.bindAddress", "127.0.0.1")
            .config("spark.driver.host", "localhost")
            .getOrCreate();

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Diekmann
Solution 2 Nagireddy Hanisha
Solution 3 user3008410
Solution 4 YohanT
Solution 5 slfan
Solution 6 AaronDT
Solution 7
Solution 8 Felipe Zschornack
Solution 9 deepb1ue