'How to define memory allocation for Spark jobs?

I'm learning about memory allocation to spark jobs running in cluster. There are so much content available, but that gives generic details.

what I'm looking for is bit different.

If am having 20 nodes in my spark cluster and so many spark jobs are scheduled to run in that cluster. Some process less data and others process large data. But sometimes data can increased in low data processing spark jobs too and vice versa.

Many jobs run in parallel too.

- would like to know, what would be the efficient way to allocate memory to spark jobs in this case ?

- Do we need to set up memory allocation depending upon data the spark job is going to process ? But don't think so, it would be good thing to do because data can vary too

- Is it a good practice, calculate all config values (number of executors, executor memory, corers per executor etc) based on complete spark cluster (number of nodes) and assign these config to all spark jobs irrespective of data size in respective spark jobs ?

please share your views



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source