Databricks is designed to separate management from data processing for better security, scalability, and compliance. This is achieved through two main components: the Control Plane and the Data Plane.

1. Control Plane – Managed by Databricks
The Control Plane is hosted and managed entirely by Databricks in their cloud environment.
It contains:
- Web Application – The Databricks workspace UI that users interact with.
- Notebook Content – Where your code, queries, and data exploration happen.
- Cluster Configuration – Stores metadata and setup details for clusters.
- Job Information – Scheduling and execution metadata for ETL pipelines and analytics jobs.
Key Role:
Handles orchestration, configuration, and metadata management — but does not store your actual customer data.
2. Data Plane – In Your Cloud Account
The Data Plane runs inside your own cloud environment (AWS, Azure, or GCP) where your data resides.
It contains:
- Clusters & Compute Nodes – Provisioned in your cloud account to run workloads.
- Client Data – The raw and processed datasets.
- Data Lake Storage – Your cloud storage (S3, ADLS, GCS) where data is read/written.
- Logs and Temporary Files – Generated during processing.
Key Role:
Executes jobs and processes your actual data without it ever leaving your cloud account.
3. How They Work Together
- The Control Plane sends instructions (via APIs) to the Data Plane.
- The Data Plane executes code, queries, and ML models against your data.
- Metadata and job status are sent back to the Control Plane for monitoring.
4. Benefits of This Architecture
- Security – Your data stays in your cloud (Data Plane).
- Performance – Processing happens close to where the data lives.
- Governance – Control Plane centralizes configuration and permissions.
- Scalability – Easily manage multiple workspaces and workloads.
Category: