'how can i avoid OOMs error in AWS Glue Job in pyspark

I am getting this error while running AWS Glue job using 40 workers and processing 40GB data

Caused by: org.apache.spark.memory.SparkOutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@5fa14240 : No space left on device

How can i optimize my job to avoid such error on pyspark

Here is the pic of metrics glue_metrics



Solution 1:[1]

AWS Glue Spark shuffle manager with Amazon S3

Requires using Glue 2.0

See the following links.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 semaphore