'How to access ADLS blob containers from Databricks using User Assigned Identity

I have ADLS storage account with blob containers. I have successfully mounted ADLS with Service Principal in Databricks and able to do my necessary transformations on the Data.

Now I'm in a process of using User Assigned Managed Identities to avoid keeping the secrets in my code. For the process, I have created required Managed Identity and enabled it to my service principal by assigning necessary role in the Storage account.

My question is how can I use the managed Identity or how can I do my transformation on the ADLS storage from Databricks without mounting or using secrets?

Please suggest a working solution or any helpful forum for the same.

Thanks.



Solution 1:[1]

You can authenticate automatically to Azure Data Lake Storage Gen1 (ADLS Gen1) and Azure Data Lake Storage Gen2 (ADLS Gen2) from Azure Databricks clusters using the same Azure Active Directory (Azure AD) identity that you use to log into Azure Databricks. When you enable Azure Data Lake Storage credential passthrough for your cluster, commands that you run on that cluster can read and write data in Azure Data Lake Storage without requiring you to configure service principal credentials for access to storage.

Enable Azure Data Lake Storage credential passthrough for a High Concurrency cluster

High concurrency clusters can be shared by multiple users. They support only Python and SQL with Azure Data Lake Storage credential passthrough.

  1. When you create a cluster, set Cluster Mode to High Concurrency.
  2. Under Advanced Options, select Enable credential passthrough for user-level data access and only allow Python and SQL commands.

enter image description here

Enable Azure Data Lake Storage credential passthrough for a Standard cluster

  1. When you create a cluster, set the Cluster Mode to Standard.
  2. Under Advanced Options, select Enable credential passthrough for user-level data access and select the user name from the Single User Access drop-down.

enter image description here

Access Azure Data Lake Storage directly using credential passthrough

After configuring Azure Data Lake Storage credential passthrough and creating storage containers, you can access data directly in Azure Data Lake Storage Gen1 using an adl:// path and Azure Data Lake Storage Gen2 using an abfss:// path.

Example:

Python - spark.read.csv("adl://<storage-account-name>.azuredatalakestore.net/MyData.csv").collect()

Refer this offcicial documentation: Access Azure Data Lake Storage using Azure Active Directory credential passthrough

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 UtkarshPal-MT