'Py4JJavaError while using streaming with PySpark

For the following code:

%%time
steps = df.select("step").distinct().collect()
for step in steps[:]:
    _df = df.where(f"step = {step[0]}")
#    by adding coalesce(1) we save the dataframe to one file
    _df.coalesce(1).write.mode("append").option("header", "true").csv("paysim1")

I am getting the following error:

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_4892/519786224.py in <module>
      2     _df = df.where(f"step = {step[0]}")
      3 #    by adding coalesce(1) we save the dataframe to one file
----> 4     _df.coalesce(1).write.mode("append").option("header", "true").csv("paysim1")

Need solution for this.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source