'Error on pySpark during the pipelining part of a Machine Learning project

I am trying to learn Spark as part of a course and I decided to make a project with MLlib. During the pipelining process I got this error:


 py4j.protocol.Py4JError: org.apache.spark does not exist in the JVM

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
  File "/home/fanis/.local/lib/python3.8/site-packages/IPython /core/interactiveshell.py", line 1934, in showtraceback
     stb = value._render_traceback_()
 AttributeError: 'Py4JError' object has no attribute '_render_traceback_'

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/home/fanis/.local/lib/python3.8/site-packages/py4j/clientserver.py", line 480, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/fanis/.local/lib/python3.8/site-packages/py4j/java_gateway.py", line 1038, in send_command
    response = connection.send_command(command)
  File "/home/fanis/.local/lib/python3.8/site-packages/py4j/clientserver.py", line 503, in send_command
    raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving

Here's the part of the code that caused this error:

from pyspark.ml import Pipeline
# I used string indexer and also one hot indexer in the previous cell
pipeline = Pipeline().setStages(stages)
model = pipeline.fit(train)

pp_df = model.transform(test)

I would prefer to not get a solution but some help in understanding what it means so I can find out what caused this and actually learn. Thanks in advance!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source