'Java gateway process exited before sending its port number

I am trying to install PySpark on my Windows 10 to be used on Jupyter Lab. I have already installed Java and running Python 3.7.3:

openjdk version "1.8.0_242"
OpenJDK Runtime Environment Corretto-8.242.08.1 (build 1.8.0_242-b08)
OpenJDK Server VM Corretto-8.242.08.1 (build 25.242-b08, mixed mode)

I installed PySpark and Findspark from the Anaconda cmd following this tutorial.

First thing is that I am not able to start the PySpark shell from the Anaconda cmd:

(base) C:\> pyspark
Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
The system cannot find the path specified.
The system cannot find the path specified.

But Python is installed in that path:

(base) C:\>python --version
Python 3.7.3

Next I tried to run PySpark on JupyterLab and I got stuck on this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-ea9424e4667c> in <module>
      1 from pyspark.sql import SparkSession
      2 
----> 3 spark = SparkSession.builder.getOrCreate()
      4 
      5 df = spark.sql("select 'spark' as hello ")

C:\Anaconda3\lib\site-packages\pyspark\sql\session.py in getOrCreate(self)
    226                             sparkConf.set(key, value)
    227                         # This SparkContext may be an existing one.
--> 228                         sc = SparkContext.getOrCreate(sparkConf)
    229                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
    230                     # by all sessions.

C:\Anaconda3\lib\site-packages\pyspark\context.py in getOrCreate(cls, conf)
    390         with SparkContext._lock:
    391             if SparkContext._active_spark_context is None:
--> 392                 SparkContext(conf=conf or SparkConf())
    393             return SparkContext._active_spark_context
    394 

C:\Anaconda3\lib\site-packages\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
    142                 " is not allowed as it is a security risk.")
    143 
--> 144         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    145         try:
    146             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

C:\Anaconda3\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
    337         with SparkContext._lock:
    338             if not SparkContext._gateway:
--> 339                 SparkContext._gateway = gateway or launch_gateway(conf)
    340                 SparkContext._jvm = SparkContext._gateway.jvm
    341 

C:\Anaconda3\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf, popen_kwargs)
    106 
    107             if not os.path.isfile(conn_info_file):
--> 108                 raise RuntimeError("Java gateway process exited before sending its port number")
    109 
    110             with open(conn_info_file, "rb") as info:

RuntimeError: Java gateway process exited before sending its port number

What am I doing wrong?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source