'Driver memory not getting cleaned up in Spark Structured Streaming
I am using my Spark Structured Streaming job to perform my ETL in AWS platform My Driver memory is not getting cleared-up. The job is reading the events from Kinesis and writing to S3 Below are the my Spark configurations. Also attaching screenshot containing Driver JVM heap usage graph for reference (1 means 100%)
spark.cleaner.periodicGC.interval=1min
spark.driver.extraJavaOptions=-XX:+UseG1GC
spark.cleaner.referenceTracking.blocking=false
amazon-web-services">
amazon-web-servicesapache-sparkpysparkspark-streamingspark-structured-streaming
Solution 1:[1]
We had seen this when Spark UI in yarn had lots of jobs being listed. We limited the no.of jobs (for ex. to 100 etc), tasks and stages shown in UI. This helped decrease driver memory usage. You can give a try
spark.ui.retainedJobs, spark.ui.retainedStages,spark.ui.retainedTasks, spark.worker.ui.retainedExecutors,spark.worker.ui.retainedDrivers, spark.sql.ui.retainedExecutions,spark.streaming.ui.retainedBatches
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Vindhya G |

