Mohammad Gufran Jahangir August 9, 2025 0

Databricks is designed to separate management from data processing for better security, scalability, and compliance. This is achieved through two main components: the Control Plane and the Data Plane.


1. Control Plane – Managed by Databricks

The Control Plane is hosted and managed entirely by Databricks in their cloud environment.
It contains:

  • Web Application – The Databricks workspace UI that users interact with.
  • Notebook Content – Where your code, queries, and data exploration happen.
  • Cluster Configuration – Stores metadata and setup details for clusters.
  • Job Information – Scheduling and execution metadata for ETL pipelines and analytics jobs.

Key Role:
Handles orchestration, configuration, and metadata management — but does not store your actual customer data.


2. Data Plane – In Your Cloud Account

The Data Plane runs inside your own cloud environment (AWS, Azure, or GCP) where your data resides.
It contains:

  • Clusters & Compute Nodes – Provisioned in your cloud account to run workloads.
  • Client Data – The raw and processed datasets.
  • Data Lake Storage – Your cloud storage (S3, ADLS, GCS) where data is read/written.
  • Logs and Temporary Files – Generated during processing.

Key Role:
Executes jobs and processes your actual data without it ever leaving your cloud account.


3. How They Work Together

  • The Control Plane sends instructions (via APIs) to the Data Plane.
  • The Data Plane executes code, queries, and ML models against your data.
  • Metadata and job status are sent back to the Control Plane for monitoring.

4. Benefits of This Architecture

  • Security – Your data stays in your cloud (Data Plane).
  • Performance – Processing happens close to where the data lives.
  • Governance – Control Plane centralizes configuration and permissions.
  • Scalability – Easily manage multiple workspaces and workloads.

Category: 
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments