'Azure Databricks Architecture - Communication between Control plane and data plane and authentications
I am trying to understand on Azure Databricks Architecture based on the this link. I could understand what is the purpose of control plane and data plane in Azure Databricks architecture.But I could't understand on the following questions .
How control plane and data plane will be communicating?
How control plane and data plane would be authenticate ?
Solution 1:[1]
There are two ways of communication between control plane & data plane:
- Legacy - when VMs running on the data plane should have the public IPs, and control plane reaches them directly. This way was always a security headache. Azure still supports it & shows in the UI, but it shouldn't be used
- "No Public IP (NPIP)" or another name "Secure Cluster Connectivity" (doc and more technical details). In this case, when VMs in the data plane are starting, they are opening a bi-directional tunnel to a relay on the control plane, and it's always used for controlling VMs & Spark. In this setup, VMs don't need public IPs, and it's much more secure & easy to control.
Regarding authentication - it's internal detail, but it provides a way of ensuring that VMs that are communicating with control plane are really that VMs that form a cluster.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Alex Ott |
