'Issue with display()/collect() Large DataFrame In Pyspark

Getting The Following Issue In PySpark to perform display()/collect() operation on top of a generated dataframe. The df contains single column & Row (JSON dump of 2GB as a content of the col).

Using databricks 3.4xlarge with 4 workers. Anyone have any known solution of this problem ?

The Error :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 53, 10.172.250.33, executor 5): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Issue with display()/collect() Large DataFrame In Pyspark

Sources

Related Questions