'Error on pySpark during the pipelining part of a Machine Learning project
I am trying to learn Spark as part of a course and I decided to make a project with MLlib. During the pipelining process I got this error:
py4j.protocol.Py4JError: org.apache.spark does not exist in the JVM
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/fanis/.local/lib/python3.8/site-packages/IPython /core/interactiveshell.py", line 1934, in showtraceback
stb = value._render_traceback_()
AttributeError: 'Py4JError' object has no attribute '_render_traceback_'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/fanis/.local/lib/python3.8/site-packages/py4j/clientserver.py", line 480, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/fanis/.local/lib/python3.8/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/home/fanis/.local/lib/python3.8/site-packages/py4j/clientserver.py", line 503, in send_command
raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
Here's the part of the code that caused this error:
from pyspark.ml import Pipeline
# I used string indexer and also one hot indexer in the previous cell
pipeline = Pipeline().setStages(stages)
model = pipeline.fit(train)
pp_df = model.transform(test)
I would prefer to not get a solution but some help in understanding what it means so I can find out what caused this and actually learn. Thanks in advance!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
