'Scala code execution on master of spark cluster?
The spark application uses some API calls which do not use spark-session. I believe when the piece of code doesn't use spark it is getting executed on the master node!
Why do I want to know this? I am getting a java heap space error while I am trying to POST some files using API calls and I believe if I upgrade the master and increase driver mem it can be solved.
I want to understand how this type of application is executed on the Spark cluster? Is my understanding right or am I missing something?
Solution 1:[1]
It depends - closures/functions passed to the built-in function transform or any code in udfs you create, code in forEachBatch (and maybe a few other places) will run on the workers. Other code runs on driver
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Arnon Rotem-Gal-Oz |
