Mohammad Gufran Jahangir April 20, 2025 0

πŸ—‚οΈ Databricks File System (DBFS) and Mounting Azure Data Lake Containers

Databricks offers a virtual distributed file system called DBFS which simplifies data access for notebooks, jobs, and clusters. To integrate external cloud storage like Azure Data Lake (ADLS Gen2), mounts are used β€” providing a convenient way to navigate and work with cloud data as if it’s part of a local filesystem.

Let’s dive into what DBFS is, how mounts work, and how to securely mount ADLS containers.


πŸ“¦ What is Databricks File System (DBFS)?

DBFS is a distributed file system built on top of Azure Blob Storage, enabling seamless access to data stored in the Databricks workspace.

πŸ“Œ Key Characteristics:

  • DBFS is mounted into the Databricks workspace root.
  • It acts as a unified interface for both ephemeral (cluster-local) and persistent storage.
  • Can be accessed from notebooks, clusters, and jobs.
  • Includes directories like /mnt, /databricks, /dbfs (depending on context).

πŸ” DBFS Architecture Overview

Control Plane

  • Managed by Databricks
  • Includes workspace UX, cluster manager, job scheduler

Data Plane

  • Resides in your Azure subscription
  • Includes VMs, Spark compute, and actual storage access
  • ADLS and Azure Blob act as the backend

πŸ“ What is DBFS Root?

The DBFS Root (e.g., /dbfs/) is the default storage for files in a Databricks workspace, backed by Azure Blob Storage.

βœ… Features:

  • Accessible via the Databricks Web UI (e.g., Data tab > DBFS)
  • Ideal for notebooks, libraries, input/output files
  • Used by default when saving data temporarily

❗ Considerations:

  • Not recommended for long-term production storage
  • Use mounts instead for accessing external storage like ADLS Gen2

πŸ”— What Are Databricks Mounts?

Mounts are persistent mount points that link external cloud storage (like Azure Data Lake or Azure Blob) to a specific directory in DBFS.

Once mounted:

  • You can use regular file system commands (%fs ls, dbutils.fs.ls(), etc.)
  • You don’t need to re-authenticate or manage tokens repeatedly
  • Mounts persist across sessions and clusters

πŸš€ How to Mount an ADLS Gen2 Container to Databricks

Mounting ADLS requires proper authentication using either access keys or service principals.

πŸ”‘ Example: Mount Using Access Key

dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account>.dfs.core.windows.net/",
  mount_point = "/mnt/mydata",
  extra_configs = {
    "fs.azure.account.key.<storage-account>.dfs.core.windows.net": "<access-key>"
  }
)

πŸ›‘οΈ Example: Mount Using Service Principal

configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": "<client-id>",
  "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="myscope", key="client-secret"),
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"
}

dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account>.dfs.core.windows.net/",
  mount_point = "/mnt/mydata",
  extra_configs = configs
)

🧼 How to Unmount a Container

dbutils.fs.unmount("/mnt/mydata")

πŸ” How to List Files

display(dbutils.fs.ls("/mnt/mydata"))

Or:

%fs ls /mnt/mydata

πŸ“Š Use Cases for Mounts

Use CaseMount Usage
Accessing Azure Data Lakeβœ… Recommended
Long-term storageβœ… Recommended
Temporary scratchpad❌ Use DBFS Root
Production pipelinesβœ… Recommended
Shared datasets between teamsβœ… Recommended

πŸ“˜ Summary

ConceptDescription
DBFSVirtual file system in Databricks backed by Blob Storage
DBFS RootDefault internal storage – good for scratchpad, not for production data
MountsPermanent links to external storage (like ADLS) using /mnt/... path
Access MethodUse Access Keys or Service Principals (prefer secrets for security)
Best PracticeSecure credentials with Databricks Secret Scope or Azure Key Vault

Category: 
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments