'Run spark cluster using an independent YARN (without using Hadoop's YARN)
I want to deploy a spark cluster with YARN cluster manager. This spark cluster needs to read data from an external HDFS filesystem belonging to an existing Hadoop ecosystem that also has its own YARN (However, I am not allowed to use the Hadoop's YARN.)
My Questions are -
- Is it possible to run spark cluster using an independent YARN, while still reading data from an outside HDFS filesystem?
- If yes, Is there any downside or performance penalty to this approach?
- If no, can I run Spark as a standalone cluster, and will there be any performance issue?
Assume both the spark cluster and the Hadoop cluster are running in the same Data Center.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
