'how can i avoid OOMs error in AWS Glue Job in pyspark
I am getting this error while running AWS Glue job using 40 workers and processing 40GB data
Caused by: org.apache.spark.memory.SparkOutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@5fa14240 : No space left on device
How can i optimize my job to avoid such error on pyspark
Here is the pic of metrics glue_metrics
Solution 1:[1]
AWS Glue Spark shuffle manager with Amazon S3
Requires using Glue 2.0
See the following links.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | semaphore |
