'Starting a pyspark session take a lot of time
Hello I am new with pyspark, and I'm stuck with this line of code:
spark = SparkSession.builder.appName('HelloWorld').getOrCreate()
The launching of the spark session won't end up, i've waited for more than 100 min and nothing, it's still compiling. Can anyone explain to me how to resolve this problem.
Solution 1:[1]
Try providing master details and see if it helps. It appears that your spark session is unable to locate the master demon
spark = SparkSession.builder.master("local").appName("test").getOrCreate()
Solution 2:[2]
As suggested in the other answer/comment there might be an issue with reaching the Spark server. If you can start a session with master('local') then that's certainly the issue.
If you are connecting to a remote Spark server there might be issues there for instance with lack of available resources, so you will need to contact the server's administrator.
Set logging level to DEBUG
To find out what's going on you can increase the debug level. First you need to locate the logging (log4j) configuration file:
import os
print(os.environ['SPARK_HOME'])
The file is called log4j.properties and should be found in the conf subfolder of $SPARK_HOME:
os.path.join(os.environ['SPARK_HOME'], 'conf')
If there's no file log4j.properties in conf there should be a log4j.properties.template. Copy the template to log4j.properties and make sure that it contains these lines (the relevant one is log4j.rootCategory=DEBUG, console):
# Set everything to be logged to the console
log4j.rootCategory=DEBUG, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
Start a new shell or pyspark and see what messages you get when attempting to start a Spark session.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | SherKhan |
| Solution 2 | user2314737 |
