'Py4JJavaError while using streaming with PySpark
For the following code:
%%time
steps = df.select("step").distinct().collect()
for step in steps[:]:
_df = df.where(f"step = {step[0]}")
# by adding coalesce(1) we save the dataframe to one file
_df.coalesce(1).write.mode("append").option("header", "true").csv("paysim1")
I am getting the following error:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_4892/519786224.py in <module>
2 _df = df.where(f"step = {step[0]}")
3 # by adding coalesce(1) we save the dataframe to one file
----> 4 _df.coalesce(1).write.mode("append").option("header", "true").csv("paysim1")
Need solution for this.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
