🔐 How to Access Azure Data Lake Gen2 from Databricks: Authentication Methods, Secrets & Recommended Patterns
Accessing Azure Data Lake Gen2 (ADLS Gen2) securely and efficiently from Azure Databricks is essential for enterprise data platforms. Depending on the use case—whether you’re a student, a data engineer in a corporate team, or running secure production workloads—there are various access patterns to choose from.
In this blog, we’ll break down each method of authentication, explain Databricks secrets management, and share the best practices for various subscriptions.
🚀 1. Accessing ADLS Gen2 Using Access Keys
Access keys are the most basic authentication method, available by default with every Azure Storage account.
🔐 Key Features:
- Two keys are provided per storage account
- Full access to the account
- Keys can be rotated periodically for security
🔧 Spark Configuration:
spark.conf.set("fs.azure.account.key.<STORAGE_ACCOUNT>.dfs.core.windows.net", "<ACCESS_KEY>")
🧩 abfs Driver Path Example:
abfss://<container>@<storage_account>.dfs.core.windows.net/<path>
Example:
abfss://demo@formula1dl.dfs.core.windows.net/test/
🔐 2. Access Using SAS Tokens (Shared Access Signature)
SAS tokens provide more granular control over what operations can be done and for how long.
🔑 Key Benefits:
- Restrict access to certain operations (read/write)
- Limit access to specific time frames or IP ranges
- Ideal for temporary or external access
🔧 Spark Configuration:
spark.conf.set("fs.azure.account.auth.type.<storage_account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.dfs.core.windows.net",
"org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.dfs.core.windows.net", "<SAS_TOKEN>")
🧑💼 3. Access Using Service Principal (via Azure AD)
This is a more secure and enterprise-ready method of authentication.
🛠 Setup Steps:
- Register an Azure AD App (Service Principal)
- Generate a client secret or certificate
- Assign Storage Blob Data Contributor role to the SP at the storage level
- Configure the Spark cluster or notebook:
spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", "<client_id>")
spark.conf.set("fs.azure.account.oauth2.client.secret", "<client_secret>")
spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/<tenant_id>/oauth2/token")
🔁 4. Cluster Scoped vs Session Scoped Authentication
📦 Cluster Scoped Authentication
- Applies to all notebooks in the cluster
- Set in the cluster’s Spark Config
📌 Use Case: Shared compute environments, scheduled jobs
🧪 Session Scoped Authentication
- Applies only to the current notebook session
- Secrets fetched via
dbutils.secrets.get
📌 Use Case: Personalized access, higher isolation
👥 5. AAD Credential Passthrough
AAD Passthrough lets individual users access storage using their own Azure AD credentials.
✅ Benefits:
- RBAC-enforced access
- Logs are attributed to specific users
- Supports row-level and column-level security when combined with Unity Catalog
⚠️ Only available on Premium Workspaces
✅ 6. Recommended Access Patterns
The method you choose depends on the type of subscription and enterprise requirements.
Subscription Type | Recommended Method |
---|---|
Student Subscription | Cluster Scoped Auth with Access Keys |
Company w/o AAD | Cluster Scoped Auth |
Free or Pay-as-you-go | Service Principal |
Enterprise (AAD enabled) | AAD Passthrough / Unity Catalog |
🔐 7. Securing Secrets in Databricks
Databricks supports secret management via:
🧰 Databricks Secret Scope
- Stored securely in Databricks
- Access via:
dbutils.secrets.get("scope-name", "key-name")
🔐 Azure Key Vault Integration
- Secrets stored in Azure-managed Key Vault
- Useful for centralized and externally managed secrets
🔁 Secret Scope Overview
Notebooks/Jobs/Clusters
↓
Databricks Secret Scope
↓
Azure Key Vault (optional)
You can manage secrets using:
- Databricks CLI
- Workspace UI
- Terraform or ARM templates
📘 Step-by-Step: Setting Up Secrets
Step | Description |
---|---|
1 | Create Azure Key Vault |
2 | Create Databricks Secret Scope |
3 | Add secrets via Azure Portal or CLI |
4 | Access secrets in notebook using dbutils |
5 | Apply secrets to cluster-wide configurations |
🧠 Final Thoughts
In a real-world Databricks project, selecting the right authentication method is key to data security, compliance, and operational flexibility. Here’s a quick wrap-up:
- Use Access Keys or SAS tokens only for short-term/testing.
- Use Service Principal for automation and long-running jobs.
- Use AAD Passthrough or Unity Catalog for enterprise-grade security.
- Always secure secrets with Databricks Secret Scope or Azure Key Vault.