📁 Databricks Mounts: Final Concepts, Benefits & Usage Patterns
Mounts are a key feature of Azure Databricks, enabling seamless integration between Databricks File System (DBFS) and Azure Data Lake Gen2 or Blob Storage. With a mount, external storage is treated like a local directory (e.g., /mnt/storage1
), which simplifies file access for notebooks, jobs, and pipelines.
Let’s wrap up by looking at real usage, security, and recommended best practices.
🧪 DBFS Root Demo: What You Need to Know
The DBFS Root is the default root directory of the Databricks File System and is often accessed using:
/dbfs/
⚠️ Considerations:
- DBFS root is not optimized for external big data storage
- Use mounts for scalable, secure, and structured data access
- DBFS root is good for: temporary storage, logs, notebook outputs
🧩 Databricks Mount Architecture (Visual Overview)
The mount is created in the Control Plane, but connects to the Data Plane where your Azure resources (like Data Lake Gen2) live.
Architecture Flow:
- Databricks Notebooks access data in DBFS
- DBFS exposes a mounted path like
/mnt/storage1
- This path is securely connected to Azure Data Lake or Blob Storage
- Credentials are passed securely (e.g., via Service Principal)
- Azure handles the underlying storage access
✅ Benefits of Using Mounts
Benefit | Explanation |
---|---|
No need to pass credentials every time | Once mounted, paths are persistent |
Access files like a local filesystem | Use dbutils.fs.ls("/mnt/yourpath") instead of URLs |
Cloud-native performance | Mounts use Azure APIs in the background |
Works with all notebooks | Share mounts across notebooks, jobs, and users |
Supports secrets | Integrate with Azure Key Vault or Databricks Secrets Scope |
🔐 Recommended Solution Before Unity Catalog
Mounts were the preferred way to access external storage before Unity Catalog became generally available in late 2022.
⚠️ Unity Catalog now provides fine-grained governance and is recommended for new production workloads.
🔄 End-to-End Flow for Mounting Azure Data Lake Gen2
Step-by-Step Logic
- Notebook requests access via
/mnt/storage1
/mnt/storage1
is a mount to Azure Data Lake Gen2- The mount uses credentials (e.g., via Service Principal or SAS token)
- The credential allows access to the container
- The files are listed, read, or written from the notebook
💡 How to Mount ADLS Gen2 with Service Principal (Code Example)
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<client-id>",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="my-secrets", key="client-secret"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"
}
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account>.dfs.core.windows.net/",
mount_point = "/mnt/storage1",
extra_configs = configs
)
🧹 Unmounting and Cleanup
# To unmount a directory
dbutils.fs.unmount("/mnt/storage1")
📘 Best Practices
Practice | Why It Matters |
---|---|
Use /mnt/... for all external sources | Provides filesystem-like access |
Use secrets for credentials | Never hardcode sensitive data |
Manage mounts with cluster init scripts or jobs | Automate for consistency |
Prefer Unity Catalog for new implementations | Supports RBAC, column-level security |
🎯 Final Thoughts
Mounts in Databricks offer a powerful and simple abstraction for accessing external data lakes. Whether you’re preparing data for machine learning or ingesting large datasets for analytics, mounting your ADLS or Blob containers gives you the best of both cloud-native performance and local-path convenience.