'Why did optimizing my Spark config slow it down?

I'm running Spark 2.2 (legacy codebase constraints) on an AWS EMR Cluster (version 5.8), and run a system of both Spark and Hadoop jobs on a cluster daily. I noticed that the configs that are submitted for Spark to EMR when spinning up the cluster have some values that are not "optimal" according to the AWS official Best Practices for managing memory guide

When configuring the cluster, the current spark-defaults were set as follows:

{
        "classification": "spark-defaults",
        "properties": {
            "spark.executor.memory": "16g",
            "spark.shuffle.io.maxRetries": "10",
            "spark.driver.memory": "16g",
            "spark.kryoserializer.buffer.max": "2000m",
            "spark.rpc.message.maxSize": "1024",
            "spark.local.dir": "/mnt/tmp",
            "spark.driver.cores": "4",
            "spark.driver.maxResultSize": "10g",
            "spark.executor.cores": "4",
            "spark.speculation": "true",
            "spark.default.parallelism": "960",
            "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
            "spark.yarn.executor.memoryOverhead": "4g",
            "spark.task.maxFailures": "10"
        }
}

Going by the guide, I made the updates to:

{
        "classification": "spark-defaults",
        "properties": {
            "spark.executor.memory": "27g",
            "spark.shuffle.io.maxRetries": "10",
            "spark.driver.memory": "27g",
            "spark.kryoserializer.buffer.max": "2000m",
            "spark.rpc.message.maxSize": "1024",
            "spark.local.dir": "/mnt/tmp",
            "spark.driver.cores": "3",
            "spark.driver.maxResultSize": "20g",
            "spark.executor.cores": "3",
            "spark.speculation": "true",
            "spark.default.parallelism": "228",
            "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
            "spark.executor.instances": "37",
            "spark.yarn.executor.memoryOverhead": "3g",
            "spark.task.maxFailures": "10"
        }
    },

My cluster is running on 1 master node, and 19 core node instances, all on-demand, and all are of this type: r3.2xlarge 8 vCore, 61 GiB memory, 160 SSD GB storage

Note the following changes and the reasoning:

"spark.executor.cores": "4", -> "3" // The optimal amount of cores/executor is 5, but go down if the math doesn't work out well based on the instance type. Since my instances have 8 cores, and you need 1 for daemon processes, I rounded down to 3 cores, 1 for the daemon (and 1 leftover). Having it at 4 meant that the executors were evenly divided, but nothing was leftover for the daemon processes - which is not good, I've read.
"spark.executor.memory": "16g", -> "27g" and "spark.yarn.executor.memoryOverhead": "4g", -> "3g" : I get 2 executors per instance. Each instance has 61 gib of memory, and I need to save 10% for executor overhead. So (61 / 2) * 90%= 27g rounded down. Here I'm dramatically increasing the working memory available to each executor, and only slightly decreasing their overhead.
"spark.driver.cores": "4", -> "3" : the guide says to keep the master the same as the executors. This one doesn't make as much sense, because there's just 1 driver operating on the master node, so I'm leaving a lot of cores unused here. Also I did "spark.driver.memory": "16g" -> "27g" for the same reason of keeping them the same. A big bump again
"spark.driver.maxResultSize": "10g", -> "20g" : this one was more arbitrary. Since my driver on the master node is only using 27g for memory, assuming another 3g for overhead, I have ~30g of memory sitting unused (I assumed) - so bumped it up.
"spark.default.parallelism": "960" - "222" - this I bumped way down, because the guide says spark.executor.instances * spark.executors.cores * 2 . So I have 19 core machines with 2 executors each with 3 cores each so that's 19 * 2 * 3 * 2 = 228. But from my understanding, bringing it down here should decrease the parallelism, meaning less context switching, but doing it in the range that makes sense for the machines.

I tried this out with my new configs, thinking I'd get better performance due to all the extra memory I was giving to the executors and driver, and one of my Spark jobs that normally runs and finishes in 45-50 minutes was still running after 95 minutes. I tried it a few times and cancelled it after this same result 3-4 times.

Any idea why my optimizations were making things worse?

scala apache-spark amazon-emr

Solution 1:^[1]

this is the config I got using this cheat sheet with that you were intending. can you try to change your overhead to the full number (3072) instead of 3g?

also check your Ganglia monitor when running your job and see if more of your memory is being utilized vs default.

https://www.c2fo.io/c2fo/spark/aws/emr/2016/07/06/apache-spark-config-cheatsheet/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Ilya P

'Why did optimizing my Spark config slow it down?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]