'Spark Cluster resources ensure system memory/cpu

New to the spark world! I am trying to have a cluster that can be called upon to process applications of various sizes. Using spark-submit to summit and yarn to schedule and handle resource management.

The following is how I am submitting the applications and I believe this is saying for this application I am requesting 3 executors with 4g memory and 5 cores each. Is it correct that this is per application? spark-submit --master yarn --driver-memory 4G --executor-memory 4G --driver-cores 5 --num-executors 3 --executor-cores 5 some.py

How do I ensure that yarn leaves enough memory and cores for GC, yarn(nodemanger,....) and that system has resources? Using yarn top I have seen there being 0 cores available and 0 gb of memory available. There must be settings for this. Isn't there?

To summarize:

  1. Are core and memory requests on a spark-submit for an individual application run?
  2. Is there any config to ensure the yarn and system has resources. Feel like I need to reserve memory and core for this.

TIA



Solution 1:[1]

Here's my algorithm first i will get the exponent from base that less than of the n then I added the current base of the loop with the n then get the base log.

function closestPower(n) {
  if(n < 4) return 4
  let closest = []
  let base = 2
  while(base < n) {
    const exponent = Math.floor(Math.log(n + base) / Math.log(base))
    const power = Math.pow(base,exponent)
    if(exponent === 1) break
    if(power === n) return n
    closest.push(power)
    base++
  }
  return closest.reduce((prev, curr) => (Math.abs(curr - n) < Math.abs(prev - n) ? curr : prev))
}

console.log(closestPower(0))
console.log(closestPower(9))
console.log(closestPower(30))
console.log(closestPower(34))
console.log(closestPower(56.5))
console.log(closestPower(123321456654))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1