,

Accessing Azure Data Lake from Databricks

Posted by

🔐 How to Access Azure Data Lake from Databricks: A Complete Overview with Real-World Examples

Accessing Azure Data Lake Gen2 (ADLS Gen2) from Azure Databricks is a critical component of modern data engineering and analytics workflows. Depending on your security needs and use cases, there are multiple authentication methods available — each offering different levels of control, flexibility, and governance.

In this article, we’ll break down each method using visual representations and explain how and when to use them with examples.


📁 1. Access via Access Keys, SAS Tokens, or Service Principals

This is the most basic method and is often used for development or one-off scripts.

🔑 Storage Access Key

spark.conf.set("fs.azure.account.key.<storage_account>.dfs.core.windows.net", "<access_key>")

🔐 Shared Access Signature (SAS Token)

spark.conf.set("fs.azure.sas.<container>.<storage_account>.dfs.core.windows.net", "<sas_token>")

👤 Service Principal Authentication

spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", "<client_id>")
spark.conf.set("fs.azure.account.oauth2.client.secret", "<client_secret>")
spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/<tenant_id>/oauth2/token")

📌 Use case: Automation scripts, service-to-service authentication, or pipelines in dev/test environments.


🧪 2. Session Scoped Authentication

With this method, the credentials apply only to the current notebook session. This allows user-specific access and better isolation.

spark.conf.set("fs.azure.account.oauth2.client.secret", dbutils.secrets.get(scope="my-scope", key="sp-secret"))

📌 Use case: Interactive notebooks, experimentation, and personal access sessions with secure secrets.


🧱 3. Cluster Scoped Authentication

Here, the credentials are set at the cluster level, making them available to all notebooks running on that cluster.

🔧 Add the following under cluster > Spark Config:

spark.hadoop.fs.azure.account.auth.type OAuth
spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id <client_id>
spark.hadoop.fs.azure.account.oauth2.client.secret <client_secret>
spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com/<tenant_id>/oauth2/token

📌 Use case: Shared jobs and collaborative environments where multiple users need the same access level.


👥 4. Azure Active Directory (AAD) Passthrough Authentication

AAD Passthrough allows users to authenticate to ADLS Gen2 using their individual Azure AD identity, ensuring fine-grained access control.

🔧 To enable:

  • Use a Premium Tier Databricks workspace
  • Enable Credential Passthrough in cluster settings
  • Ensure RBAC/ACL permissions are set on the storage account

📌 Use case: Enterprises with strict identity-based access control and audit requirements.


🧩 5. Unity Catalog Integration

Unity Catalog offers centralized governance for all data assets. It enhances security and data management with features like:

  • Row and column-level access control
  • Built-in data lineage
  • Central metadata management
  • Access control via Azure AD groups

📌 Use case: Enterprise-wide data platforms that need centralized access control across teams and workspaces.

-- Example SQL command
GRANT SELECT ON CATALOG marketing_data TO `data-analysts`

📋 6. Summary of Access Methods

Access MethodScopeSecurity LevelIdeal For
Access KeysAccount-wide🔓 LowDev/test environments
SAS TokensTime-limited⚠️ MediumTemporary access scenarios
Service PrincipalApp/Cluster🔐 HighAutomated jobs and production pipelines
Session Scoped AuthNotebook session🔐 HighIsolated, secure, and temporary notebook runs
Cluster Scoped AuthShared clusters🔐 MediumShared access across a team
AAD PassthroughUser-based🔐 Very HighFine-grained, identity-based access
Unity CatalogPlatform-wide🔐 Very HighEnterprise data governance and access control

✅ Recommended Approach for Enterprise Projects

For production-grade solutions:

  • Use Unity Catalog if you require centralized access control.
  • Use AAD Passthrough if you need user-specific access enforcement.
  • Use Service Principal with secrets or Key Vaults for automation and pipelines.
  • Avoid using access keys or SAS tokens in production unless temporarily required.

📎 Final Notes

Azure Databricks supports multiple flexible and secure ways to access Azure Data Lake Gen2. The right approach depends on your environment, data governance needs, and the scale of your organization.

As a best practice, always:

  • Use Azure Key Vault or Databricks Secrets to store sensitive credentials.
  • Prefer AAD and Unity Catalog in enterprise-grade architectures.
  • Apply RBAC and ACLs at the storage level to reinforce security.

guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x