,

How to Access Azure Data Lake Gen2 from Databricks

Posted by

🔐 How to Access Azure Data Lake Gen2 from Databricks: Authentication Methods, Secrets & Recommended Patterns

Accessing Azure Data Lake Gen2 (ADLS Gen2) securely and efficiently from Azure Databricks is essential for enterprise data platforms. Depending on the use case—whether you’re a student, a data engineer in a corporate team, or running secure production workloads—there are various access patterns to choose from.

In this blog, we’ll break down each method of authentication, explain Databricks secrets management, and share the best practices for various subscriptions.


🚀 1. Accessing ADLS Gen2 Using Access Keys

Access keys are the most basic authentication method, available by default with every Azure Storage account.

🔐 Key Features:

  • Two keys are provided per storage account
  • Full access to the account
  • Keys can be rotated periodically for security

🔧 Spark Configuration:

spark.conf.set("fs.azure.account.key.<STORAGE_ACCOUNT>.dfs.core.windows.net", "<ACCESS_KEY>")

🧩 abfs Driver Path Example:

abfss://<container>@<storage_account>.dfs.core.windows.net/<path>

Example:

abfss://demo@formula1dl.dfs.core.windows.net/test/

🔐 2. Access Using SAS Tokens (Shared Access Signature)

SAS tokens provide more granular control over what operations can be done and for how long.

🔑 Key Benefits:

  • Restrict access to certain operations (read/write)
  • Limit access to specific time frames or IP ranges
  • Ideal for temporary or external access

🔧 Spark Configuration:

spark.conf.set("fs.azure.account.auth.type.<storage_account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.dfs.core.windows.net", 
    "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.dfs.core.windows.net", "<SAS_TOKEN>")

🧑‍💼 3. Access Using Service Principal (via Azure AD)

This is a more secure and enterprise-ready method of authentication.

🛠 Setup Steps:

  1. Register an Azure AD App (Service Principal)
  2. Generate a client secret or certificate
  3. Assign Storage Blob Data Contributor role to the SP at the storage level
  4. Configure the Spark cluster or notebook:
spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", "<client_id>")
spark.conf.set("fs.azure.account.oauth2.client.secret", "<client_secret>")
spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/<tenant_id>/oauth2/token")

🔁 4. Cluster Scoped vs Session Scoped Authentication

📦 Cluster Scoped Authentication

  • Applies to all notebooks in the cluster
  • Set in the cluster’s Spark Config

📌 Use Case: Shared compute environments, scheduled jobs


🧪 Session Scoped Authentication

  • Applies only to the current notebook session
  • Secrets fetched via dbutils.secrets.get

📌 Use Case: Personalized access, higher isolation


👥 5. AAD Credential Passthrough

AAD Passthrough lets individual users access storage using their own Azure AD credentials.

✅ Benefits:

  • RBAC-enforced access
  • Logs are attributed to specific users
  • Supports row-level and column-level security when combined with Unity Catalog

⚠️ Only available on Premium Workspaces


✅ 6. Recommended Access Patterns

The method you choose depends on the type of subscription and enterprise requirements.

Subscription TypeRecommended Method
Student SubscriptionCluster Scoped Auth with Access Keys
Company w/o AADCluster Scoped Auth
Free or Pay-as-you-goService Principal
Enterprise (AAD enabled)AAD Passthrough / Unity Catalog

🔐 7. Securing Secrets in Databricks

Databricks supports secret management via:

🧰 Databricks Secret Scope

  • Stored securely in Databricks
  • Access via: dbutils.secrets.get("scope-name", "key-name")

🔐 Azure Key Vault Integration

  • Secrets stored in Azure-managed Key Vault
  • Useful for centralized and externally managed secrets

🔁 Secret Scope Overview

Notebooks/Jobs/Clusters 
      ↓
Databricks Secret Scope
      ↓
Azure Key Vault (optional)

You can manage secrets using:

  • Databricks CLI
  • Workspace UI
  • Terraform or ARM templates

📘 Step-by-Step: Setting Up Secrets

StepDescription
1Create Azure Key Vault
2Create Databricks Secret Scope
3Add secrets via Azure Portal or CLI
4Access secrets in notebook using dbutils
5Apply secrets to cluster-wide configurations

🧠 Final Thoughts

In a real-world Databricks project, selecting the right authentication method is key to data security, compliance, and operational flexibility. Here’s a quick wrap-up:

  • Use Access Keys or SAS tokens only for short-term/testing.
  • Use Service Principal for automation and long-running jobs.
  • Use AAD Passthrough or Unity Catalog for enterprise-grade security.
  • Always secure secrets with Databricks Secret Scope or Azure Key Vault.
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x