,

Databricks File System (DBFS) and Mounting Azure Data Lake Containers

Posted by

🗂️ Databricks File System (DBFS) and Mounting Azure Data Lake Containers

Databricks offers a virtual distributed file system called DBFS which simplifies data access for notebooks, jobs, and clusters. To integrate external cloud storage like Azure Data Lake (ADLS Gen2), mounts are used — providing a convenient way to navigate and work with cloud data as if it’s part of a local filesystem.

Let’s dive into what DBFS is, how mounts work, and how to securely mount ADLS containers.


📦 What is Databricks File System (DBFS)?

DBFS is a distributed file system built on top of Azure Blob Storage, enabling seamless access to data stored in the Databricks workspace.

📌 Key Characteristics:

  • DBFS is mounted into the Databricks workspace root.
  • It acts as a unified interface for both ephemeral (cluster-local) and persistent storage.
  • Can be accessed from notebooks, clusters, and jobs.
  • Includes directories like /mnt, /databricks, /dbfs (depending on context).

🔍 DBFS Architecture Overview

Control Plane

  • Managed by Databricks
  • Includes workspace UX, cluster manager, job scheduler

Data Plane

  • Resides in your Azure subscription
  • Includes VMs, Spark compute, and actual storage access
  • ADLS and Azure Blob act as the backend

📁 What is DBFS Root?

The DBFS Root (e.g., /dbfs/) is the default storage for files in a Databricks workspace, backed by Azure Blob Storage.

✅ Features:

  • Accessible via the Databricks Web UI (e.g., Data tab > DBFS)
  • Ideal for notebooks, libraries, input/output files
  • Used by default when saving data temporarily

❗ Considerations:

  • Not recommended for long-term production storage
  • Use mounts instead for accessing external storage like ADLS Gen2

🔗 What Are Databricks Mounts?

Mounts are persistent mount points that link external cloud storage (like Azure Data Lake or Azure Blob) to a specific directory in DBFS.

Once mounted:

  • You can use regular file system commands (%fs ls, dbutils.fs.ls(), etc.)
  • You don’t need to re-authenticate or manage tokens repeatedly
  • Mounts persist across sessions and clusters

🚀 How to Mount an ADLS Gen2 Container to Databricks

Mounting ADLS requires proper authentication using either access keys or service principals.

🔑 Example: Mount Using Access Key

dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account>.dfs.core.windows.net/",
  mount_point = "/mnt/mydata",
  extra_configs = {
    "fs.azure.account.key.<storage-account>.dfs.core.windows.net": "<access-key>"
  }
)

🛡️ Example: Mount Using Service Principal

configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": "<client-id>",
  "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="myscope", key="client-secret"),
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"
}

dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account>.dfs.core.windows.net/",
  mount_point = "/mnt/mydata",
  extra_configs = configs
)

🧼 How to Unmount a Container

dbutils.fs.unmount("/mnt/mydata")

🔍 How to List Files

display(dbutils.fs.ls("/mnt/mydata"))

Or:

%fs ls /mnt/mydata

📊 Use Cases for Mounts

Use CaseMount Usage
Accessing Azure Data Lake✅ Recommended
Long-term storage✅ Recommended
Temporary scratchpad❌ Use DBFS Root
Production pipelines✅ Recommended
Shared datasets between teams✅ Recommended

📘 Summary

ConceptDescription
DBFSVirtual file system in Databricks backed by Blob Storage
DBFS RootDefault internal storage – good for scratchpad, not for production data
MountsPermanent links to external storage (like ADLS) using /mnt/... path
Access MethodUse Access Keys or Service Principals (prefer secrets for security)
Best PracticeSecure credentials with Databricks Secret Scope or Azure Key Vault

guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x